Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Masters Project CThornhill v2 final

261 views

Published on

  • Be the first to comment

  • Be the first to like this

Masters Project CThornhill v2 final

  1. 1. Secure File Management Using the Public Cloud A Masters in Cybersecurity Practicum Project Cecil Thornhill ABSTRACT The Project explores the history and evolution of document management tools through the emergence of cloud computing and documents the development of a basic cloud computing web based system for secure transmission and storage of confidential information on a public cloud following guidance for federal computing systems.
  2. 2. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 2 of 46 Introduction ................................................................................................................3 Background of the Driving Problem – Ur to the Cloud ..................................................3 The Cloud in Context – A New Way to Provide IT .........................................................7 Cloud Transformation Drivers......................................................................................8 The Federal Cloud & the Secure Cloud Emerge.......................................................... 10 Designing a Project to Demonstrate Using the Cloud ..................................................13 Planning the Work and Implementing the Project Design ...........................................15 Findings, Conclusions and Next Steps.........................................................................32 References.................................................................................................................34 Source Code Listings ..................................................................................................39 Test Document ..........................................................................................................46
  3. 3. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 3 of 46 Introduction This paper describes the design and development of a system to support the encrypted transfer of confidential and sensitive Personally Identifiable Information (PII) and Personal Healthcare Information (PHI) to a commercial cloud based object storage system. This work was undertaken as a Practicum project for the Masters in Cybersecurity program, and as such was implemented within the time limits of a semester session and was completed by a single individual. This prototype represents a basic version of a web-based system implemented on a commercial cloud based object storage system. The prototype demonstrates an approach to implementation suitable for use by government or private business for the collection of data subject to extensive regulation such as HIPAA/HiTech healthcare data, or critical financial data. A general review of the context of the subject area and history of document management are provided below, along with a review of the implementation efforts. Findings and results are provided both for the implementation efforts as well as the actual function of the system. Due to the restricted time available for this project, the scope was limited to fit the schedule. Only basic features were implemented per the design guidance documented below. To explore future options for expansion of the project several experiments designed to further analyze the system capacity and performance are outlined below. These options represent potential future directions to further explore this aspect of secure delivery of information technology functions using cloud-based platforms. Background of the Driving Problem – Ur to the Cloud The need to exchange documents containing important information between individuals, and enterprises is a universal necessity in any organized human society. Since the earliest highly organized human cultures information about both private and government activities has been recorded on physical media and exchanged between parties1. Various private and government couriers were used to exchange documents in the ancient and classical world. In the West, this practice of private courier service continued after the fall of Rome. The Catholic Church acted as a primary conduit for document exchange and was itself a prime consumer of document exchange services2. In the West, after the renaissance the growth of both the modern nation state and the emergence of early commerce and capitalism were both driven by and supportive of the growth of postal services open to private interest. The needs of commerce quickly came to dominate the traffic, and shape the evolution of document exchange via physical media3. In the early United States the critical role of publicly accessible document exchange was widely recognized by the founders of the new democracy. The Continental Congress in1775 established the US Postal
  4. 4. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 4 of 46 Service to provide document communications services to the emerging new government prior to the declaration of independence4. As a new and modern nation cost effective, efficient document exchange services from the new post office were essential to the growth of the US economy5. The growth of the US as a political and economic power unfolds in parallel with the Industrial Revolution in England and Europe as well as the overall transition of the Western world to what can be described as modern times. New science, new industry and commerce and new political urgencies all drive the demand for the transmission of documents and messages in ever faster and more cost effective forms6. It is within this accelerating technical and commercial landscape that the digital age is born in the US when Samuel Morse publicly introduces the telegraph to the world in 1844 with the famous question “What Hath God Wrought?” sent from the US Capitol to the train statin in Baltimore, Maryland7. Morse’s demonstration was the result of years of experiment and effort by hundreds of people in scores of countries, but has come to represent the singular moment of creation for the digital era and marks the beginning of the struggle to understand and control the issues stemming from document transmission in the digital realm. All of the issues we face emerge from this time forward, such as: • Translation of document artifacts created by people into digital formats and the creation of human readable documents from digital intermediary formats. • The necessity to authenticate the origin of identical digital data sets and to manage the replication of copies. • The need to enforce privacy and security during the transmission process across electronic media. Many of these problems have similar counterparts in the physical document exchange process, but some such as the issue of an indefinite number of identical copies were novel and all these issues require differing solutions for a physical or digital environment8. The telegraph was remarkable successful due to its compelling commercial, social and military utility. As Du Boff and Yates note in their research: “By 1851, only seven years after the inauguration of the pioneer Baltimore-to- Washington line, the entire eastern half of the US up to the Mississippi River was connected by a network of telegraph wires that made virtually instantaneous communication possible. By the end of another decade, the telegraph had reached the west coast, as well9, 10 “. The reach of the telegraph went well beyond the borders of the US, or even the shores of any one continent by 1851. That same year Queen Victoria sent president
  5. 5. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 5 of 46 Buchannan a congratulatory telegram to mark the successful completion of the Anglo-American transatlantic cable project11. Digital documents now had global scope, and the modern era of document exchange and management had truly arrived. The US Civil war would be largely shaped by the technical impact of the telegraph and railroad. Both the North and South ruthlessly exploited advances in transportation and communication during the conflict12. Centralization of information management and the need to confidentiality, integrity, and availability all emerged as issues. Technical tools like encryption rapidly became standard approaches to meeting these needs13. The patterns of technical utilization during the war provided a model for future civil government and military use of digital communications and for digital document transmission. The government’s use patterns then became a lesson in the potential for commercial use of the technology. Veterans of the war went on to utilize the telegraph as an essential tool in post war America’s business climate. Rapid communication and a faster pace in business became the norm as the US scaled up its industry in the late 19th century. Tracking and managing documents became an ever-increasing challenge along with other aspects of managing the growing and geographically diverse business enterprises emerging. By the turn of the 20th century the telegraph provided a thriving and vital alternative to the physical transmission of messages and documents. Most messages and documents to be sent by telegraph were either entered directly as digital signals sent originally by telegraphy, or transcribed by a human who read and re-entered the data from the document. However, all of the modern elements of digital document communication existed and were in some form of use, including the then under-utilized facsimile apparatus14. As the 20th century progresses two more 19th century technologies which would come to have a major impact on document interchange and management would continue to evolve in parallel with the telegraph: mechanical/electronic computation and photography. Mechanical computation tracing its origin from Babbage’s Analytical Engine would come to be indispensible in tabulating and managing the data needed to run an increasingly global technical and industrial society15. Photography not only provided a new and accurate record of people and events, but with the development of fine grained films in the 20th century, microfilm would come to be the champion of high density document and hence information storage media. Despite some quality drawbacks, the sheer capacity and over 100- year shelf life of microfilm made it very attractive as a document storage tool. By the 1930’s microfilm had become the bulk document storage medium of choice for publications and libraries as well as the federal government16.
  6. 6. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 6 of 46 The experience with early electronic computers in World War II and familiarity with microfilm made merging the two technologies appear as a natural next step to forward thinkers. In 1945 Vannevar Bush, the wartime head of the Office of Scientific Research and Development (OSRD) would propose the Memex. Memex was designed as an associative information management device combining electronic computer-like functions with microfilm storage, but was not fully digital nor was it networked17. In many ways this project pointed the way to modern information management tools that were introduced in the 1960’s but not fully realized until the end of the 20th century. Bush, V., & Think, A. W. M. (1945). The Atlantic Monthly. As we may think, 176(1), 101-108. The commercial release and rapid adoption of modern computer systems such as the groundbreaking IBM 360 in the 1960’s, and series of mini-computer systems in the 1970 such as the DEC VAX greatly expanded the use of digital documents and created the modern concept of a searchable database filled with data from these documents. The development of electronic document publishing systems in the 1980’s allowed for a “feedback loop” that allowed digital data to go back into printed documents, generating a need to manage these new documents with the computers used to generate them from the data and user input. The growth of both electronic data exchange and document scanning in the 1990’s, to began to replace microfilm. Many enterprises realized the need to eliminate paper and only work with electronic versions of customer documents. The drive for more efficient and convenient delivery of services as well as the need to reduce the cost of managing paper records continues to drive the demand for electronic document management tools. By the 1990’s large-scale document management and document search systems such as FileNet and its competitors began to emerge into the commercial market. The emergence of fully digital document management systems in wide spread use by the turn of the 21st century brings the story of document management into the present day, where we see a predominance of electronic document systems, and an expectation of quick and universal access to both the data and documents as artifacts in every aspect of life, including activities that are private, commercial and interactions with the government. As the demand for large electronic document management infrastructures the scale of these systems and related IT infrastructure continued to expand, placing significant cost stress on the enterprise. There was a boom in the construction of data centers to house the infrastructure. At the same time that the physical data centers for enterprises were expanding, a new model of enterprise computing was being developed: Cloud Computing.
  7. 7. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 7 of 46 The Cloud in Context – A New Way to Provide IT In 1999 Salesforce popularized the idea of providing enterprise applications infrastructure via a website, and by 2002 Amazon started delivering computation and storage to enterprises via the Amazon Web Services platform. Google, Microsoft and Oracle as well as a host of other major IT players quickly followed with their own version of cloud computing options. These new cloud services offered the speed and convenience of web based technology with the features of a large data center. An enterprise could lease and provision cloud resources with little time and no investment in up front costs for procurement of system hardware. By 2009 options for cloud computing were plentiful, but there was as yet little generally accepted evidence about the reasons for the shift or even the risk and benefits18. What made cloud systems different from earlier timeshare approaches and data center leasing of physical space? Why were they more compelling than renting or leasing equipment? While a detailed examination of all the concepts and considerations leading to the emergence of cloud computing is outside the scope of this paper, there is a broad narrative that can be suggested based on prior historical study of technological change from steam to electricity and then to centralized generations systems. While the analogies may not all be perfect, they can be useful tools in contextualizing the question of "why cloud computing now?" In the 19th century, the development of practical steam power drove a revolution in technical change. The nature of mechanical steam power was such that the steam engine was intrinsically local, as mechanical power is hard to transmit across distance19 . When electrical generation first emerged at the end of the 19th century, the first electrical applications tended to reproduce this pattern. Long distance distribution of power was hard to achieve, and so many facilities used generators for local power production20 . The nature of electricity was quite different from mechanical power, and so breakthroughs in distribution were rapid. Innovators such as Tesla and Westinghouse quickly developed long distance transmission of electricity. This electrical power distribution breakthrough allowed the rapid emergence of very large centralized power stations; the most significant of these early centers was the Niagara hydroelectric station21 . Today, most power is generated in large central stations. Power is transmitted via a complex national grid system. The distribution grid is an amalgam of local and regional grids22 . However this was not the end of the demand for local generators. In fact more use of electricity lead to more demand for local generators, but for non-primary use cases such as emergency power, or for alternate use cases such as remote or temporary power supplies23, 24 . The way local generation was used changed with the shift to the power grid in ways that can be seen to parallel to shift from local data centers to cloud based data center
  8. 8. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 8 of 46 operations. Wile it is true that early computers were more centralized since the mid 70's and the emergence of the mini-computer and then micro-computer that came to prominence in the 80's, a much more distributed pattern emerged. The mainframe and mini-computer became the nucleus of emerging local data centers in every enterprise. As Local Area Networks emerged they reinforced the role of the local data center as a hub for the enterprise. Most enterprises in the 1980’s and 90’s had some form of local data center, in a pattern not totally dissimilar to that of early electric generators. As the networks grew in scale and speed, they began to shift the patterns of local computing to emphasize connectivity and wider geographic area of service. When the commercial Internet emerged in the 1990's the stage was set for a radical change, in much the same way that the development of efficient electrical distribution across a grid changed the pattern of an earlier technical system. Connectivity became the driving necessity for en enterprise competing to reach its supply chain and customers by the new network tools. By the turn of the 21st century, firms like Google and Amazon were experimenting with what the came to consider a new type of computer, the Warehouse Scale Computer. By 2009 this was a documented practical new tool, as noted in Google’s landmark paper “The Datacenter as a Computer An Introduction to the Design of Warehouse-Scale Machines”, Luiz André Barroso and Urs Hölzle, Google Inc. 2009. This transition can be considered as similar to the move to centrally generated electrical power sent out via the grid. In a similar manner it will not erase local computer resources but will alter their purpose and use cases25 . As was the case for the change to more centralized electrical generation, by the early 21st century there was considerable pressure on IT managers to consider moving from local data centers to cloud based systems. For both general computing and for document management systems this pressure tends to come from two broad source categories: Technical/Process drivers and Cost drivers. Technical drivers include the savings in deployment time for servers and systems at all points in the systems development lifecycle, and cost drivers are reflected in the reduced operational costs provided by cloud systems26. Cloud Transformation Drivers Technical and Process drivers also include considerations such as functional performance and flexible response to business requirements. The need to be responsive in short time frames as well as to provide the latest trends in functional support for the enterprise business users and customers favors the quick start up times of cloud based IT services. The wide scope of the business use case drivers goes beyond the scope of this paper, but is important to note.
  9. 9. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 9 of 46 Cost drivers favoring cloud based IT services are more easily understood in the context of document management as discussed in this paper. Moving to cloud based servers and storage for document management systems represents an opportunity to reduce the Total Cost of Ownership (TCO) of the IT systems. These costs include not only the cost to procure the system components but also the cost to operate them in a managed environment, controlled by the enterprise. Even it appears there is no compelling functional benefit to be obtained by the use of cloud based systems, the cost factors alone are typically compelling as a driver for the decision to move document management systems move from local servers and storage to the cloud. As an example of the potential cost drivers, Amazon and other vendors offer a number of TCO comparison tools that illustrate the case for cost savings from cloud- based operations. While the vendors clearly have a vested interest in promotion of cloud based operations, these tools provide a reasonable starting point for an “apples to apples” estimate of costs for local CPU and storage vs. cloud CPU and storage options. Considering that the nature of document systems is not especially CPU intense, but is very demanding of storage subsystems this cost comparison is a good starting point, as it tends to reduce the complexity of the pricing model. For purposes of comparison here the Amazon TCO model will be discussed below to examine the storage costs implications for a small (1TB) document store. The default model from Amazon starts with an assumption of 1 TB of data, that requires “hot” storage (fast access for on demand application support), full plus incremental backup and grows by 1TB per month in size27. This is a good fit for a modest document storage system and can be considered a “ballpark” baseline. Total Cost of Ownership. (2016). Retrieved July 06, 2016, from http://www.backuparchive.awstcocalculator.com/ Amazon’s tool estimates this storage to cost about $ 308,981 per year for local SAN backed up to tape. The tool estimates the same storage using the cloud option cost about $37,233 for a year. The cost of local hot storage alone is estimated at $129,300 for and $29,035 for Amazon S3 storage. Based on the author’s past experience in federal IT document management systems, these local storage costs are generally within what could be considered reasonably relevant and accurate for a private or federal data center storage TCO cost ranges. Processing costs estimates for servers required in the storage solution are also within the range of typical mid- size to large data center costs based on author’s experience over the past 8 years with federal and private data center projects. Overall, the Amazon tool does appear to produce estimates of local costs that can be considered reasonably viable for planning purposes. This rough and quick analysis form the Amazon TCO tool gives a good impression of the level of cost savings possible with cloud-based systems. It serves as an example of some of the opportunities presented to IT managers faced with a need to control
  10. 10. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 10 of 46 budgets and provide more services for less cost. The potential to provide the same services for half to ¼ the normal cost of local systems is very interesting to most enterprises as a whole. When added to the cloud based flexibility to rapidly deploy and the freedom to scale services up and down, these factors helps to explain the increased preference for cloud based IT deployment. This preference for cloud computing now extends beyond the private sector to government enterprises seeking the benefits of the new computing models offered by cloud vendors. The Federal Cloud & the Secure Cloud Emerge For the federal customer the transition to Warehouse Scale Computing and the public cloud can be dated to 2011 when the FedRAMP initiative was established. The FedRAMP program is based on policy guidance from President Barack Obama’s 2001 paper titled "International Strategy for Cyberspace” 28 as well as the "Cloud First" policy authored by US CIO Vivek Kundra 29 and the “Security Authorization of Information Systems in Cloud Computing Environments “30 memo from Federal Chief Information Officer, Steven VanRoekel. Together these documents framed the proposed revamp of all federal Information Technology systems: In the introduction to his 2011 cloud security memo, VanRoekel provides some concise notes on the compelling reasons for the federal move to cloud computing: “Cloud computing offers a unique opportunity for the Federal Government to take advantage of cutting edge information technologies to dramatically reduce procurement and operating costs and greatly increase the efficiency and effectiveness of services provided to its citizens. Consistent with the President’s International Strategy for Cyberspace and Cloud First policy, the adoption and use of information systems operated by cloud service providers (cloud services) by the Federal Government depends on security, interoperability, portability, reliability, and resiliency. 30 “ Collectively, these three documents and the actions they set in motion have transformed the federal computing landscape since 2011 and as the private sector’s use of local computing has begun a rapid shift to the cloud driven by competition and the bottom line, in the short space of 5 years the entire paradigm for IT in the federal government of the US has shifted radically. It is not unreasonable to expect that by 2020, cloud computing will be the norm, not the exception for any federal IT system. This transition offers huge opportunities, but brings massive challenges to implement secure infrastructure in a public cloud computing space. Functionally, the conversion from physical to electronic documents has a number of engineering requirements, but above and beyond this, there are legal and security considerations that make any document management system more complex to impalement than earlier databases of disparate facts. Documents as an entity are more than a collection of facts. They represent social and legal relationships and
  11. 11. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 11 of 46 agreements. As such the authenticity, integrity, longevity and confidentiality of the document as an artifact matter. The security and privacy implications of the continued expansion of electronic exchange of data in consumer and commercial financial transactions was incorporated into the rules, regulations and policy guidance included in the Gramm-Leach-Bliley Act of 199931. A good example of the wide swath of sensitive data that needs to be protected in both physical and electronic transactions is shown in the Sensitive Data: Your Money AND Your Life web page that is part of the Safe Computing Pamphlet Series from MIT. As the page notes: “Sensitive data encompasses a wide range of information and can include: your ethnic or racial origin; political opinion; religious or other similar beliefs; memberships; physical or mental health details; personal life; or criminal or civil offences. These examples of information are protected by your civil rights. Sensitive data can also include information that relates to you as a consumer, client, employee, patient or student; and it can be identifying information as well: your contact information, identification cards and numbers, birth date, and parents’ names. 32 “ Sensitive data also includes core identity data aside from the information about any particular event, account or transaction, personal preferences, or self identified category. Most useful documents supporting interactions between people and business or government enterprises contain Personally Identifiable Information (PII), which is defined by the Government as: "...any information about an individual maintained by an agency, including any information that can be used to distinguish or trace an individual’s identity, such as name, Social Security number, date and place of birth, mother’s maiden name, biometric records, and any other personal information that is linked or linkable to an individual. 33," Identity data is a special and critical subset of sensitive data, as identity data is required to undertake most of the other transactions, and to interact with essential financial, government or healthcare services. As such this data must be protected from theft or alteration to protect individuals and society as well as to ensure the integrity of other data in any digital system34. In order to protect this PII data the Government through the National Institute of Standards (NIST) defines a number of best practices and security controls that form the basis for sound management of confidential information. 35 These controls include such concepts as: • Identification and Authentication - uniquely identifying and authenticating users before accessing PII
  12. 12. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 12 of 46 • Access Enforcement - implementing role-based access control and configuring it so that each user can access only the pieces of data necessary for the user‘s role. • Remote Access Control - ensuring that the communications for remote access are encrypted. • Event Auditing - monitor events that affect the confidentiality of PII, such as unauthorized access to PII. • Protection of Information at Rest - encryption of the stored information storage disks. In addition to these considerations, many enterprises also need to handle documents that contain both PII and medical records or data from medical records, or Protected Heath Information (PHI). Medical records began to be stored electronically in the 1990’s. By the early part of the 21st century this growth in electronic health records resulted in a new set of legislation design to both encourage the switch to electronic health records and to set up guidelines and policy for managing and exchanging these records. The Health Insurance Portability and Account- ability Act (HIPAA) of 1996 creates a set of guidelines and regulations for how enterprises much manage PHI36. Building on HIPAA, the American Recovery and Reinvestment Act of 2009 and the Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009 added additional policy restrictions, and security requirements as well as penalties for failure to comply with the rules37. These regulations for PHI both overlap and add to the considerations for data and documents containing PII. The HITEC law increased the number of covered organizations or “entities” from those under the control of the HIPAA legislations: “Previously, the rules only applied to "covered entities," including such healthcare organizations as hospitals, physician group practices and health insurers. Now, the rules apply to any organization that has access to "protected health information. 38” HITEC also added considerable detail and clarification as well as new complexity and even more stringent penalties for lack of compliance or data exposure or “breaches”. Under HITEC a breach is defined as: "…the unauthorized acquisition, access, use or disclosure of protected health information which compromises the security or privacy of such information, except where the unauthorized person to whom such information is disclosed would not reasonably have been able to retain such information. 38" The result of the considerations needed to manage documents that might contain Sensitive Data, PII or PHI or any combination of these elements is that any document management system implemented in private or public data centers must
  13. 13. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 13 of 46 implement a wide range of technical and procedural steps to operate in a secure manner. Protection of the security, privacy and integrity of the documents and data in those documents becomes a major part of the challenge to designing, building and operating any information system. These engineering efforts are essential to business operations however they also become part of the cost for any system, and as such can be a considerable burden on the budget of any enterprise. Designing a Project to Demonstrate Using the Cloud It is within this context of providing a secure system leveraging cloud-based benefits that the practicum project described in this paper was designed. The goal of the project was to demonstrate a viable approach to following the policy guidance as provided for federal IT systems. To achieve this goal, the first step was to understand the context as outlined in the discussion above. The next step was to design a system that followed sound cybersecurity principles and the relevant policy guidance. Based on the demand for electronic document management in both private and government enterprise, a basic document management system was selected as the business case for the prototype to be developed. Document management provides an opportunity to implement some server side logic for the operation of the user interface and for the selection and management of storage systems. Document management also provides a driving problem that allows for clear utilization of storage options, and thus can demonstrate the benefits of the cloud based storage options that feature prominently in the consideration of cloud advantages of both speed of deployment and lower TCO. These considerations were incorporated in the decision to implement a document management system as the demonstration project. The scope of the system was also a key consideration. Given the compressed time frame and limited access to developer resources that are intrinsic to a practicum project, the functional scope of the document management system would need to be constrained. As a solo developer, the range of features that can be implemented would need to be limited to the basic functions needed to show proof of concept for the system. In this case, this were determined to be: 1. The system would be implemented on the Amazon EC2 public cloud for the compute tier of the demonstration. 2. The system would utilize Amazon S3 object storage as opposed to block storage. 3. The system would be implemented using commercially available Amazon provided security features for ensuring Confidentiality, Integrity and Accessibility39.
  14. 14. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 14 of 46 Dimov, I. (2013, June 20). Guiding Principles in Information Security - InfoSec Resources. Retrieved July 09, 2016, from http://resources.infosecinstitute.com/guiding-principles-in-information-security/ 4. The servers used for the project would all be Linux based. 5. The system would feature a basic web interface to allow demonstration of the ability to store documents. 6. The system would use Public Key Infrastructure certificates generated commercially to meet the need to support encryption for both web and storage components. 7. The web components of the prototype would use HTTP to enforce secure connection to the cloud based servers and storage. 8. The system would utilize a commercial web server infrastructure suitable for scaling up to full-scale operation but only a single instance would be implemented in the prototype. 9. The web components would be implemented in a language and framework well suited to large-scale web operations with the ability to handle large concurrent loads. 10. Only a single demonstration customer/vendor would be implemented in the prototype. 11. The group and user structure would be developed and implemented using the Amazon EC2 console functions. 12. Only the essential administrative and user groups would be populated for the prototype. 13. The prototype would feature configurable settings for both environment and application values set by environment, files, and Amazon settings tools. The current prototype phase would not introduce a database subsystem expected to be used to manage configuration in a fully production ready version of the system. 14. Data files used in the prototype would be minimal versions of XML files anticipated to be used in an operational system, but would only contain structure and minimal ID data not full payloads. In the case of a narrowly scoped prototype such as this demonstration project it is equally critical to determine what function is out of scope. For this system this list included the following: • The web interface would be left in a basic state to demonstrate proof of function only. Elaboration and extension of the GUI would be outside the scope of the work for this prototype project. • There would be no restriction on the documents to be uploaded. Filtering vendor upload would be outside the scope of work for this prototype. • Testing uploads with anti-virus/malware tools would be outside the scope of this prototype project.
  15. 15. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 15 of 46 • Security testing or restriction of the client would be outside the scope of this project. The URL to access the upload function would be open for the prototype and the infrastructure for user management would not be developed in the prototype. • Load testing and performance testing of the prototype would be outside the scope of this phase of the project. • No search capacity would be implemented to index the data stored in the S3 subsystem in the prototype project. Proof of concept was thus defined as: A) The establishment of the cloud based infrastructure to securely store documents. B) The implementation of the required minimal web and application servers with the code required to support upload of documents. C) The successful upload of test documents to the prototype system using a secure web service. While the scope of the project may appear modest and the number of restrictions for the phase to be implemented in the practicum course period an numerous, these scope limitations proved vital to completion of the project in the anticipated period. The subtle challenges to implementation of this proof of concept feature set proved more than adequate to occupy the time available and provided considerable scope for learning and valuable information for future projects based on cloud computing, as detailed in the subsequent sections of this paper. Planning the Work and Implementing the Project Design To move to implementation, the next phase of the Software Development Lifecycle (SDLC) the requirements and scope limitations listed above were used to develop a basic project plan for the project consisting of two main phases: A) The technical implementation of the infrastructure and code through to proof of concept. B) The documentation of the project work and production of this report/paper. The project management of any implementation process for a project is a critical success factor for any enterprise no matter how large of small. This is very true for cloud computing projects as they often represent a significant departure from existing IT systems and processed for an enterprise. This was the case in this project as well. While no formal GNATT or PERT chart was developed for the project plan, as there was no need to transmit the plan to multiple team members, an informal breakdown
  16. 16. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 16 of 46 was used to guide the technical implementation in an attempt to keep it on schedule: Week 1: Establish the required Amazon EC2 accounts and provision a basic server with a secure management account for remote administration of the cloud systems. Week 2: Procure the required PKI certificates and then configure the certificates needed to secure access to the servers, and any S3 storage used by the system. Configure the S3 Storage. Week 3: Obtain and install the required commercial web server and application server to work together and utilize a secure HTTP configuration for system access. Implement any language framework needed for application code development. Week 4: Research and develop the required application code to demonstrate file upload and reach proof of concept. Create any required data files for testing. Weeks 5-8: Document the project and produce the final report/paper. In practice this proposed 8 week schedule would slip by about 4 weeks due to about 2 weeks of extra work caused by the complexity and unexpected issues found in the system and code development implementation and about 2 weeks of delays in the write up caused by the author’s relocation to a new address. These delays in schedule are not atypical of many IT projects. They serve to illustrate the importance of both planning and anticipation of potential unexpected factors when implementing new systems that are not well understood in advance by the teams involved. Allowing slack in any IT schedule, and especially those for new systems is key to a successful outcome as it allows flexibility to deal with unexpected aspects of the new system. The very first tasks to be undertaken in the execution of the project plan for this project was to establish the required Amazon Elastic Compute Cloud (Amazon EC2) accounts. EC2 is the basic cloud infrastructure service provided by Amazon. This service provides user management, security, system provisioning, billing and reporting features for Amazon’s cloud computing platform. It is the central point for administration of any hosted project such as the prototype under discussion in this paper40. Because the author was an existing Amazon customer with prior EC2 accounts, the existing identification and billing credentials could be used for this project as well. Both identity and billing credentials are critical components for this and any other cloud based project on Amazon or any other cloud vendor. It is axiomatic that the identity of at least one responsible party, either an individual or institution, must be known for the cloud vendor to establish systems and accounts in its infrastructure. This party acts as the “anchor” for any future security chain to be established. The
  17. 17. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 17 of 46 primary account will act as the ultimate system owner and will be responsible for the system’s use or abuse and for any costs incurred. Below is an example home screen for the author’s project on EC2: Responsibility for costs is the other key aspect of the primary EC2 account. While cloud computing may offer cost savings benefits, it is by no means a free service. Every aspect of the EC2 system is monetized and tracked in great detail to ensure correct and complete billing for any features used by an account holder. Some basis for billing must be provided at the time any account is established. In the case of this project all expenses for the EC2 features used would be billed back to the author’s credit account previously established with Amazon. In any cloud project it is vital that each team member committing to additional infrastructure have the understanding that there will be a bill for each feature used. Amazon and most cloud vendors offer a number of planning and budgeting tools for projecting the costs of features before making a commitment. This is helpful, but is not a substitute for clearly communicating and planning for costs in advance among the development team members and project owners, stakeholders and managers. In the case of this project, while the author did reference the budgeting tools to note costs estimates, communication and decisions were simple due to the singular team size. Below is an example of the billing report console:
  18. 18. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 18 of 46 Establishment of the basic account for the project was, as indicated simple due to the author having an existing EC2 account. To provision a server, it was necessary to determine the configuration most appropriate for the project’s needs, and then determine the Amazon Availability Zone where the server should be located. The server configuration would be decided by estimating the required performance characteristics needed to host the required software and execute the application features for the anticipated user load. In this case, all these parameters were scoped to be minimal for the prototype to be created, reducing the capacity of virtual server required. Based on the author’s experience with Linux servers a small configuration would meet the needs of the project. Using the descriptive materials provided by Amazon detailing the server performance, a modest configuration of server was selected to host the project: • t2.micro: 1 GiB of memory, 1 vCPU, 6 CPU Credits/hour, EBS-only, 32 bit or 64-bit platform41 When the server was provisioned RedHat was selected as the OS. Other Linux distributions and even Windows operating systems were available from Amazon EC2. Red Hat was selected in order to maintain the maximum compatibility to systems now in use by the federal systems currently approved for use in production systems per the author’s personal experience. Use of Red Hat Linux also makes getting support and documentation of any open source tools from the Internet easier as this is a popular distribution for web based systems. Below is a release description from the virtual instance as configured on EC2 for this project:
  19. 19. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 19 of 46 By default the server was provisioned in the same zone as the author’s prior EC2 instances, which was us-west-2 (Oregon). An Availability Zone (zone) is the Amazon data center used to host the instance. Availability zones are designed to offer isolation from each other in the event of service disruption in any one zone. Each zone operates to the published Service Level Agreement provided by Amazon42. Understanding the concept of zone isolation and the key provisions of the SLA provided by a cloud vendor are important to the success of any cloud based project. Highly distributed applications or those needed advanced fault tolerance and load balancing might choose to host in multiple zones. For the purposed of this project a single zone and the SLA offered by Amazon was sufficient for successful operation. However, the default zone allocation was problematic and was the first unexpected implementation issue. Almost all EC2 features are offered in the main US zones, but us-east-1 (N. Virginia) does have a few more options available than us-west-2 (Oregon). In order to explore the implications and effort needed to migrate between zones and ensure access to all potential features, the author decided to migrate the project server to the us-east-1 zone. Migration involved a backup of the configured server, which appeared to be prudent operational activity anyway. Following the backup, the general expectation was that the instance could be restored directly in the desired location and then the old instance could be removed. In general this expectation proved to be sound, but the exact steps were not so direct. Some of the complexity was strictly due to needing to allow for replication time. Some of the complexity proved to be due to the use of a Elastic IP address that creates a public IP address for the server. An AWS Elastic IP provided a static public IP that can then be associated with any instance on EC2, allowing public DNS configuration to then be re-mapped as needed to any collection of EC2 servers. The author had a prior Elastic IP and expected to
  20. 20. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 20 of 46 just re-use it for this project, but as noted in the AWS EC2 documentation “An Elastic IP address is for use in a specific region only43”. This created an issue when the instance was migrated across zones. Once the problem was understood, the solution was to release the old Elastic IP and generate a new Elastic IP that could be mapped using DNS. This new Elastic IP could be associated with the servers now restored to the us-east-1 (N. Virginia). This step wound up taking quite a bit of time to debug and fix in the first week, and was to lead to the next unexpected issues with DNS. None of this work was so complex as to put the project at risk. This required IP change does illustrate the fact that understanding the SLA and restrictions of each cloud feature is critical. Small issues like requiring a change of IP address can have big implications for other work in a project. Decisions to provision across zones are easy in the cloud, but can have unintended consequences, such as this IP address change and the subsequent work in DNS that generated. All of these issues take resources and cost time in a project schedule. An existing domain, Juggernit.com, already registered to the author was the expected target domain. Since one of the requirements for the project was to get a Public Key for the project site, it was essential to have a publicly registered Internet domain to use for the PKI. Once the public IP was re-established in the new us-east- 1 zone, and connectivity was confirmed by accessing the instance using SSL, the next unexpected task was moving the DNS entries for the instance from the current registrar. This would also include learning to configure the Amazon Elastic Load Balancer and then map the domain to it. The load balancer forwards any HTTP or HTTPS traffic to the HTTPS secure instance. The HTTPS instance is the final target for the project. Amazon Elastic Load Balancing is a service that both distributes incoming application traffic across multiple Amazon EC2 instances, and allows for complex forwarding to support forcing secure access to a domain. In this instance while the project would not have many servers in the prototype phase, the use of load balancing would reflect the “to be” state of a final production instance and allow secure operations in even development and preliminary phases of the project used for the practicum scope. The load balancer configuration would require a domain record of the form: juggerload1-123781548.us-east-1.elb.amazonaws.com (A Record) As noted in the Amazon web site, you should not actually use an “A Record” in your DNS for a domain under load balancing: Because the set of IP addresses associated with a LoadBalancer can change over time, you should never create an "A record” with any specific IP address. If you want to use a friendly DNS name for your load balancer instead of the name generated by
  21. 21. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 21 of 46 the Elastic Load Balancing service, you should create a CNAME record for the LoadBalancer DNS name, or use Amazon Route 53 to create a hosted zone. For more information, see Using Domain Names With Elastic Load Balancing44. The Juggernit.com domain was being managed by Network Solutions. Unfortunately the GUI used by Network Solutions did not allow for the entry of the CNAME record formats needed for the EC2. This required moving the domain out of the control of Network Solutions and into the Amazon Route53 domain management service. The Route 53 service has a variety of sophisticated options, but most critically, it interoperates well with other Amazon EC2 offerings including the load balancing features45. Route 53 is a good example of not only an unexpected issue that must be overcome to migrate to the cloud, but how the nature of the cloud platform creates a small “ecosystem” around the cloud vendor. Even when striving for maximum standards compliance and openness, the nature of the cloud platform offerings such as load balancing tend to create interoperations issues with older Internet offerings like those for DNS from Network Solutions, which date from the origin of the commercial Internet. The author had used Network Solutions DNS since the late 1990’s, but in this instance there was no quick path to a solution other than migration to the Amazon Route 53 offering. The Juggernit.com domain would need to be linked to the public IP of the instance, and pragmatically this was only achievable via Route 53 services. Once the situation was analyzed after consultation with both Network Solutions and Amazon support, the decision to move to Route 53 was made. The changes were relatively quick and simple using the Network Solutions and Amazon web consoles. Waiting for the DNS changes to propagate imposed some additional time, but as with the zone migration, the delay was not critical to the project schedule. With the server, public IP address and DNS issues resolved PKI certificate generation could be attempted. The author was relatively experienced in generation and use of PKI credentials, but once again the continued evolution of the Internet environment and of cloud computing standards was to provide unexpected challenges to the actual implementation experience. There are many vendors offering certificates suitable for this practicum project, including Amazon’s own new PKI service. The author selected Network Solutions as a PKI provider. Using another commercial certificate vendor offered an opportunity to explore the interoperation of Amazon’s platform with other public offerings. Network Solutions also has a long history with the commercial Internet and has a well-regarded if not inexpensive certificate business46. The certificates were issued in a package including both the typical root certificate most Internet developers are used to, as well as a number of intermediate
  22. 22. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 22 of 46 certificates that were less familiar to the author. In most cases inside an enterprise, certificates are issued for enterprise resources by trusted systems and all the intermediate certificates are often in place already. This was not the case for the Amazon EC2 infrastructure for this project. In this instance, not only was the root certificate needed, but also all the intermediates must be manually bundled into the uploaded package47. This was a new process for the author and management of intermediate certificates represented another unexpected task. The need to include the intermediate certificates in the upload to Amazon was not immediately apparent and debugging the reason why uploading just the root certificate did not work (as with prior systems) was going to involve a major research effort and many hours of support diagnostics with each vendor involved. To make the issue more complex, there was documentation the Amazon support team found for some certificate vendors and there was documentation for cloud service vendors found by Network Solutions support, but neither firm had documents for working with certificates or cloud services from the other – this was the one case not documented anywhere. The Network solution certificates were issued using a new naming format that did not follow either the older Network Solutions documentation to identify the proper chaining order. Amazon was also not totally sure what orders would constitute a working package. A number of orders had to be tried and tested one at a time and then the errors diagnosed for clues as to the more correct order needed in the concatenate command. On top of this, the actual Linux command to concatenate and hence chain the certificates was not exactly correct when attempted. This was due to the text format at the end of the issued certificates. Manual editing of the files was needed to fix the inaccurate number of delimiters left in the resulting text file. The final command needed for the Amazon load balancer was determined to be: amazon_cert_chain.crt; for i in DV_NetworkSolutionsDVServerCA2.crt DV_USERTrustRSACertificationAuthority.crt AddTrustExternalCARoot.crt ; do cat "$i" >> amazon_cert_chain.crt; echo "" >> amazon_cert_chain.crt; done This back and forth diagnostic work for certificate chains represented a major unexpected source of complexity and extra work. Again, this did not disrupt the execution schedule beyond a recoverable limit. The experience with certificate chaining was a valuable learning opportunity on the pragmatic use of PKI tools. The author has subsequently come across a number of federal IT workers encountering these challenges as more and more systems start to include components from outside vendors in the internal enterprise infrastructure. After the installation of the certificates, the next major configuration tasks were the installation and configuration of the web server and the application server platforms on the EC2 instance. Nginx is the web server used on the project, and
  23. 23. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 23 of 46 Node.JS and the Express framework is used as the application server. Each of these subsystems provided further opportunities for learning as they were installed. Nginx was selected to provide an opportunity to gain experience with this very popular commercial platform as well as due to its reputation for high performance and excellent ability to scale and support very high traffic web sites. Nginx was designed from the start to address the C10K problem (10,000 concurrent connections) using an asynchronous, non-blocking, event-driven connection- handling algorithm48. This is very different from the approach taken by Apache or many other available web servers. In the author’s experience many web sites that start out with more traditional web servers such as Apache, experience significant scale issues as they grow due to high volumes of concurrent users. Starting with Nginx was an attempt to avoid problem this by design, though installation and configuration of the web server was more complex The open source version of Nginx was used for the project, as a concession to cost management. Downloading the correct code did prove to be somewhat of an issue, as it was not easy to find the correct repositories for the current package and then it turned out the application had to be updated before it could function. It was also critical to verify the firewall status once the system was providing connections. The Amazon install of Red Hat Linux turns out to disable the default firewalls and instead use the Amazon built in firewalls for the site. This actually provides a very feature rich GUI firewall configuration but is another non-standard operations detail for those familiar with typical Red Hat stand-alone server operations. The firewall was another implementation detail that could not easily be anticipated. After the firewall was sorted out there remained considerable research to determine how to configure the Nginx web server to utilize HTTPS based on the certificates for the domain. Again the issue turned out to be due to the chaining requirements for the certificate. In this case, Nginx needed a separate and different concatenated package in this format: cat WWW.JUGGERNIT.COM.crt AddTrustExternalCARoot.crt DV_NetworkSolutionsDVServerCA2.crt DV_USERTrustRSACertificationAuthority.crt >> cert_chain.crt After determining the correct concatenation format needed for Nginx and making the appropriate uploads of concatenated files, HTTPS services were available end to end. However, Nginx does not provide dynamic web services. To serve dynamic content it would be necessary to install and configure the Node.JS Web Application Server and the Express framework. Node.JS (Node) is an open source server-based implementation of the JavaScript language originally developed by Ryan Dahl in 2009 using both original code and
  24. 24. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 24 of 46 material from the Google V8 JavaScript engine. Most significantly, Node is event- driven, and uses a non-blocking I/O model. This makes Node both very fast and very easy to scale. Node is extremely well suited to situations like the C10K problem, and web sites that scale quickly and efficiently. Being based on JavaScript, Node is Object oriented and offers a huge open source support base of modules and libraries, accessed using the Node Package Manager (NPM). Express is a minimal and flexible Node.js web application framework based on many of the ideas about web site design and development taken from the Ruby of Rails framework project. Express offers a set of standard libraries and allows users to mix in many other NPM tool to create web sites base on the original Ruby on Rails principle of “convention over configuration” by providing a common structure for web apps49. Installation of Node on the server was done using the standard Red Hat Package Manager tools. Once Node is installed, the Node Package Manager (NPM) system can be used to bootstrap load any other packages such as the Express framework. In a production system it is expected that the web server and the application server would be hosted on separate hardware instances, but since the practicum was to be subject to only a small load, both serves can run on the same instance of Linux with little impact. While Node comes with its own dynamic web server to respond to request for dynamic web content, it is not well suited to heavy-duty serving on the font end. Nginx is design for the task of responding to high volumes of initial user inquiries. The combination of a high performance web server (Nginx) and some number (N) application server instances (such as Node) is a widely accepted pattern that supports large scale web systems. Implementation of this design pattern was a goal of the prototype, to pre-test integration all the constituent components even prior to any load testing of the system. Deployment and configuration of Nginx and Node to the single Linux server fulfills this requirement and provides a working model that can be expanded to multiple servers as needed in the future. In order to smoothly transfer web browser request from users to the application server domain, the web server must act as a reverse proxy for the application server. To accomplish this with Nginx requires the addition of directives to the Nginx configuration file inside the “server” section of the configuration file. These commands will instruct the web server to forward web traffic (HTTPS) request for dynamic pages targeted at the DNS domain from Nginx to Node.JS. This is a relatively standard forwarding for Nginx and only requires a small amount of research to verify the correct server configuration directive as shown in this example from the Nginx documentation: server { #here is the code to redirect to node on 3000
  25. 25. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 25 of 46 location / { proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header Host $http_host; proxy_pass "http://127.0.0.1:3000"; }} Note that this is just an example for use on Local Host with a Node.JS engine running on port 3000 (any port will suffice). The critical issue is to configure Nginx to act as a reverse proxy to the Node.JS engine. Nginx will then send traffic to the configured port for the Node.JS application instance. Node.JS and Express thenuse a RESTFUL approach to routing to the application logic based on parsing the URL. The reverse proxy configuration will ensure that when traffic comes into the Nginx server with the format “HTTPS://Juggernit.com/someurl” it will be handled by the appropriate logic section of the Node.JS applications as configured in the Express framework. The Express listener will catch the traffic on port 3000 and use the route handler code in express to parse the URL after the slash and ensure that the proper logic for that route is launched to provide the service requested. This is a well established RESTFUL web design pattern, first widely popularized in Ruby on Rails and adopted by a number of web frameworks for languages such as Java, Node or Python, etc. Implementing this pattern requires that both Nginx and Node be installed on the server to be used as a pre-requisite. In addition, the Express framework for web applications used by Node must also be loaded to allow at least a basic test of the forwarding process. All of this code is available as open source, so access to the needed components was not a blocker for the project. Each of these components was first loaded onto the Author’s local Unix system (a Macbook Pro using OSX). This allowed for independent and integration testing of the Nginx web server, the Node application server and the Express web framework. By altering the configuration file and adding the appropriate directives as noted above, the reverse proxy configuration and function could be tested locally as well against the local host IP address. After validation of the configuration requirements locally on the Author’s development station, the web server and application server needed to both be installed on the cloud server. As noted above, Nginx was actually loaded on the cloud server earlier to allow for configuration of the domain and HTTPS secure access to the site. This left only the installation of the Node and Express application server components. While conceptually easy, in practice loading Node also proved to provide unexpected challenges. The 7.x Red Hat version of Linux installed on the cloud server supports Node in the RPM package manager system. However the available RPM version was only a 0.10.xx version. The current version of Node is
  26. 26. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 26 of 46 4.4.x. The stable development version installed on the Author’s local system was 4.4.5 (provided from the Node web site). There are substantial syntax and function differences between the earlier version of Node and the current version. This required that the Node install on the cloud server be updated, and that proved to require help from the Amazon support team, as following the default upgrade instructions did not work. Again, the delay was not large, but cost a couple days between testing, exploration of options, and final correction of the blocking issues. The final install of a current 4.4.x version of Node required a complete uninstall of the default version, as upgrading resulted in locked RPM packages. After cleaning up the old install and loading the new Node version, the cloud server was conformed to the required Node version. The Express framework was loaded on the server via the standard command line Node Package Manager (NPM) tool. A simple “Hello World” test web application was created in Express/Node and again the function of both the Nginx and Node servers was validated. To accomplish the verification of web and application server function an Amazon firewall change was required to allow Node to respond directly to traffic pointed at the IP address of the server and the port number (3000) of the Node server was needed. This firewall rule addition allowed testing of HTTPS traffic targeted at the domain name, which was served by Nginx. HTTP traffic directed to the IP address and port 3000 could then be tested at the same time, as this traffic was served by the test Node/Express application. To complete the integration, the next step was to reconfigure the Nginx server to act as a reverse proxy. The Nginx configuration file was backed up, and then the reverse proxy directives as shown above were added to the Nginx configuration file, and Nginx was reloaded to reflect the changes. At this point, Nginx no longer provided its default static web page to request sent to HTTPS://Juggernit.com. Instead, Nginx forwarded the HTTPS traffic to the Node application server, still under the secure connection, and Node responded with the default “Hello World” page as configured in the Express test application. This state represented a complete integration of Nginx and Node for the project. The server was backed up and the next stage of work to implement the upload logic to store data on the Amazon S3 object store could continue. The two major tasks required to finish the site configuration and functional completion of the prototype project were: • Establishment of an Amazon S3 storage area (know as a “bucket” on Amazon) • Coding server and client logic to access the S3 storage via HTTPS
  27. 27. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 27 of 46 The first of these tasks could be accomplished directly via the Amazon EC2 management console. For the prototype there was no requirement for a custom web interface to create S3 storage, and no requirement for any automatic storage assignment or management. In a fully realized production application it is possible that application based management of storage might be desirable, but this is a system feature requirement highly subject to enterprise policy and business case needs. However, even when using the Amazon interface to manage S3 storage as in this project, there was still a need to consider the user and group structure in order to manage access security to the S3 storage. As discussed earlier in the paper, a default EC2 account assumes that the owner is granted all access to all resources configured by that owner in the Amazon cloud infrastructure. For this reason, it is important to create separate administrative accounts for resources that require finer grained access and might also require access restrictions. In a fully realized web application hosted on local servers, this user and group management is often done at the application level. For this prototype these considerations were to be managed by the Amazon EC2 interface. Prior to setting up a storage area on the S3 object storage, the administrator group named “admins” was created, with full permissions to manage the site resources. Another group called “partners” with access to the S3 storage, but not other site resources for management of servers was created. A user named “testone” was then created and added to the “partners” group. The Author used the primary Amazon identity to build and manage the site, but the administrative group was constructed so that any future web based management functions could be separated from user- oriented functions of the prototype web application. With the users and groups established, the S3 storage called “ctprojectbucketone” was created using the standard Amazon GUI. Below is a screenshot showing this bucket:
  28. 28. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 28 of 46 To manage access rights, the S3 storage was then assigned a Cross-Origin Resource Sharing (CORS) access policy that allowed GET, POST and PUT permissions to the S3 storage. As shown below: The “partner” group was assigned access to this storage by providing them with the resource keys. With the creation of the S3 Object Storage “bucket”, the remaining task to reach functional proof of concept for the prototype project was to construct the JavaScript application code to access the S3 storage bucket securely from the Internet. To create the logic for bucket access there were a number of pre-requisite steps not emphasized so far. The most significant of these steps was to develop at least a basic familiarity with Node.JS and JavaScript. While the author posses some number of years of experience with using JavaScript in a casual manner for other web applications, site development in JavaScript was a very different proposition. Node also has its own “ecosystem” of tools and libraries, much like any emerging open source project. Some understanding of these was also essential to succeed in creating the code required to achieve a proof of concept function for the prototype site. As a starting point the main Node site, https://nodejs.org/en/, provided an essential reference. In addition the author referenced two very useful textbooks: • Kiessling, Manuel. "The node beginner book." Available at [last accessed: 18 March 2013]: http://www. nodebeginner. org (2011). • Kiessling, Manuel. “The Node Craftsman Book. “.Available at [last accessed: 25 October 2015]: https://leanpub.com/nodecraftsman)(2015). These proved to be essential in providing both background on Node, and some guidance on the use of the Express application framework. In addition a number of other small Node library packages were key to creating the required code, specifically:
  29. 29. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 29 of 46 • Node Package Manager (NPM) – a Node tool for getting and managing Node packages (library’s of function). https://www.npmjs.com • EXPRESS- a Node library providing an application framework for RESTFUL web applications based on the concepts from Ruby on Rails. https://expressjs.com • Dotenv – a Node library to allow loading environment variables from a configuration file with the extension .env. This was used to allow passing critical values such as security keys for S3 storage in a secure manner from the server to a client. https://www.npmjs.com/package/dotenv • EJS – a Node library that allows embedded JavaScript in an HTML file. This was used to add the required logic to communicate to the server components of the application and then access the S3 bucket from the client page using values securely passed over HTTPS. https://www.npmjs.com/package/ejs • AWS-SDK – a Node library provided by Amazon to support basic functions for the S3 storage service to be accessed by Node code. https://www.npmjs.com/package/aws-sdk As a newcomer to Node, the most critical problem in creation of this code for the Author was a lack of standard examples to S3 access using a common approach at a sufficiently simple level of clear explanation. There are actually at least dozens of sample approaches to integration of S3 storage in Node projects, but almost all use very idiosyncratic sets of differing libraries or don’t address some critical but basic aspect of the prototype such as secure access. There are also a number of very sophisticated and complete examples that are almost incompressible to the Node novice. This inability to find a clear and functional pattern to learn from was a major delay of over a week and a half in completion of the final steps of the prototype. After considerable reading, coding, and searching for reference models, the Author finally came across a tutorial from Dr. Will Webberly of the Cardiff University School of Computer Science & Informatics. The author read, studied and analyzed the example provided. The next step was to create several test programs to adapt the approach used by Dr. Webberly in the Heroku cloud instance he documented to a local Node Express project50. After some trial and error and some correspondence with Dr. Webberly via email, a working set of code emerged. The final proof of concept function was a minimal web application based on the patter used by Dr. Webberly and running in a cloud based server as an Express application using local variables on the Amazon EC2 server. The server code provides a restful service over HTTPS to allow a client web page executing on the remote PC or device to upload to the S3 storage using HTPS. Below is a screenshot of some of the server side code:
  30. 30. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 30 of 46 The upload page logic is provided by the project web site, as is the back end server logic. Since the client page is running on a remote device, the entire transfer is done using client resources. The prototype project site provides only context and security data, but is not used to manage the upload. This frees server side resources from the work of the transfer and thus creates a higher performance distributed system. The exchange of logic and credentials is all done over the HTTPS protocol with the client, as is the subsequent file upload. This provides a secure method of access to the cloud based S3 storage. Client side data from the partner is encrypted in transfer and no other parties besides the partner and the prototype project operations teams have access to the S3 bucket. For purposes of the prototype only one client identity and one bucket were produced. In a fully realized system, there could be unique buckets for each client, subject to the security and business rules required by the use case of the system. After establishing that the Node logic was in fact working and successfully uploaded files to the S3 storage, a small set of sample health records based on the Veterans Administration Disability Benefits Questionnaires (DBQs)51 were constructed. Below is a sample of one of these files:
  31. 31. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 31 of 46 These simulated DBQ records were then uploaded as a test, and verified as correct using the Amazon S3 GUI to access the documents for verification. PDF format was used for the test files to make them directly readable via standard viewing tools. Here is a screenshot of the uploaded test files in the Amazon S3 bucket: This test represents uploading the sort of sensitive and confidential data expected to be collected and managed in any finished system based on the prototype project. While basic in its function creation and upload of these documents provided the final steps in the implementation of this phase of the prototype project. Below is a screen shot showing the selection of a DBQ for upload using the client side web page:
  32. 32. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 32 of 46 Storing these files represents the completion of the major design goals of the project and the completion of the implementation phase, and the prototype project itself. Findings, Conclusions and Next Steps While achieving the successful secure upload of the test documents to the prototype meets the objectives set out for this project, it represents only the first milestone in extending the system to a more full featured platform, and exploration of additional topics of interest in this area. The architecture implemented offers a good example of the latest non-blocking, asynchronous approach to serving web content. These designs exploit CPU resources in very different ways than traditional code and web frameworks, and there is ample room for scale and load testing to measure the actual capacity of these systems to perform on 64 but architectures. The asynchronous and distributed client controlled approach to storage access also provides an opportunity to test the capacity of the S3 interface to support concurrent access. The Results should provide tuning direction about the number and partition rules for the S3 storage. A larger scale simulation with many more virtual clients would be a natural approach to measuring the capacity of this use pattern. The web site functions also offer an opportunity to expand the functionality of the system and demonstrate more advance fine grain access controls supported by the user and group model. At a minimum a database of administrators and partners can be created to both lock the site down from casual access, and to explore the minimal levels of access needed to still meet all functional needs. Driving each role to he absolute lowest level of privilege will likely require trial and error, but should be a benefit in assuring the site has a minimal profile to any potential attackers.
  33. 33. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 33 of 46 In addition to these operations oriented future areas of research, once a larger data set is simulated the ability of the S3 storage to support search indexing on the XML data is a rich area of exploration. There is emerging federal guidance on the best practice for meta-data tagging of PII and PHI data, and this prototype would allow for an easy way to create versions of S3 buckets with a variety of meta-data patterns and then determine the most efficient search and index options for each with a higher volume of simulated data. An expanded prototype could act as a test platform for future production systems, revealing both physical and logical performance metrics. Each of these future options provides scope to expand the project, but the basic implementation also provides some important benefits: • The implementation of the system shows that it is pragmatic to store sensitive data on a public cloud based system using PKI infrastructure to protect the data from both external in cloud vendor access. • The design of the prototype shows that modest cloud resources can in fact be used to host a site with the capacity to provide distributed workload using HTTPS to secure the data streams and leverage client resources to support data upload, not just central server capacity. • The prototype shows that it is relatively easy to use Object Storage to acquire semi-structured data such as XML. This validates use of an Object Store as a form of document management tool beyond block storage. • The establishment of the project in only a few weeks with limited staff house shows the cost and speed advantages of the cloud as opposed to local physical servers. • The experience with both the cloud and new web servers and languages demonstrates the importance of flexible scheduling and allowing for the unexpected. Even on projects that leverage many off the shelf components unexpected challenges often show up and consume time and resources. The prototype produced as a result of this project does meet the guidance for building secure projects on a public infrastructure. It allows PII and PHI data to be transferred to an enterprise via secure web services, and demonstrates an approach that can satisfy many enterprises and the guidelines for HIPAA and HiTech data handling. The architecture used demonstrates how a scalable web service model can be implemented using a cloud infrastructure by a small team in a limited time. The model does only provide a basic proof of concept but offers easy opportunities to expand to explore a number of additional questions. As such the resulting site can be considered a success at meetings it design goals, and the information generated in the site development can be employed by both the Author and others for future work in cloud computing implementation for secure digital document storage.
  34. 34. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 34 of 46 References 1. Oppenheim, A. L. (Ed.). (1967). Letters from Mesopotamia: Official business, and private letters on clay tablets from two millennia. University of Chicago Press. Page 1-10 2. Fang, I. (2014). Alphabet to Internet: Media in Our Lives. Routledge. Page 90-91 3. Noam, E. M. (1992). Telecommunications in Europe (pp. 363-368). New York: Oxford University Press. Page 15-17 4. Moroney, R. L. (1983). History of the US Postal Service, 1775-1982 (Vol. 100). The Service. 5. John, R. R. (2009). Spreading the news: The American postal system from Franklin to Morse. Harvard University Press. Page 1-25 6. Johnson, P. (2013). The birth of the modern: world society 1815-1830. Hachette UK. 7. Currie, R. (2013, May 29). HistoryWired: A few of our favorite things. Retrieved May 15, 2016, from http://historywired.si.edu/detail.cfm?ID=324 8. Standage, T. (1998). The Victorian Internet: The remarkable story of the telegraph and the nineteenth century's online pioneers. London: Weidenfeld & Nicolson. 9. Yates, J. (1986). The telegraph's effect on nineteenth century markets and firms. Business and Economic History, 149-163. 10. Du Boff, R. B. (1980). Business Demand and the Development of the Telegraph in the United States, 1844–1860. Business History Review, 54(04), 459-479. 11. Gordon, J. S. (2002). A thread across the ocean: the heroic story of the transatlantic cable. Bloomsbury Publishing USA. 12. Ross, C. D. (2000). Trial by fire: science, technology and the Civil War. White Mane Pub. 13. Bates, D. H. (1995). Lincoln in the telegraph office: recollections of the United States Military Telegraph Corps during the Civil War. U of Nebraska Press.
  35. 35. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 35 of 46 14. Coopersmith, J. (2015). Faxed: The Rise and Fall of the Fax Machine. JHU Press. 15. Cortada, J. W. (2000). Before the computer: IBM, NCR, Burroughs, and Remington Rand and the industry they created, 1865-1956. Princeton University Press. 16. Smith, E. (2016, June 14). The Strange History of Microfilm, Which Will Be With Us for Centuries. Retrieved June 22, 2016, from http://www.atlasobscura.com/articles/the-strange-history-of-microfilm- which-will-be-with-us-for-centuries 17. Bush, V., & Think, A. W. M. (1945). The Atlantic Monthly. As we may think, 176(1), 101-108. 18. Mohamed, A. (2015, November). A history of cloud computing. Retrieved July 07, 2016, from http://www.computerweekly.com/feature/A-history-of- cloud-computing 19. Electric Light and Power System - The Edison Papers. (n.d.). Retrieved July 13, 2016, from http://edison.rutgers.edu/power.htm 20. The discovery of electicity - CitiPower and Powercor. (n.d.). Retrieved July 13, 2016, from https://www.powercor.com.au/media/1251/fact-sheet- electricity-in-early-victoria-and-through-the-years.pdf 21. Powering A Generation: Power History #1. (n.d.). Retrieved July 13, 2016, from http://americanhistory.si.edu/powering/past/prehist.htm 22. Electricity - Switch Energy Project Documentary Film and ... (n.d.). Retrieved July 13, 2016, from http://www.switchenergyproject.com/education/CurriculaPDFs/SwitchCur ricula-Secondary-Electricity/SwitchCurricula-Secondary- ElectricityFactsheet.pdf 23. Tita, B. (2012, November 6). A Sales Surge for Generator Maker - WSJ. Retrieved July 13, 2016, from http://www.wsj.com/articles/SB100014241278873248941045781033340 72599870 24. Residential Generators, 3rd Edition - U.S. Market and World Data. (n.d.). Retrieved July 13, 2016, from https://www.giiresearch.com/report/sbi227838-residential-generators- 3rd-edition-us-market-world.html
  36. 36. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 36 of 46 25. Barroso, L. A., Clidaras, J., & Hölzle, U. (2013). The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture, 8(3), 1-154. 26. West, B. C. (2014). Factors That Influence Application Migration To Cloud Computing In Government Organizations: A Conjoint Approach. 27. Total Cost of Ownership. (2016). Retrieved July 06, 2016, from http://www.backuparchive.awstcocalculator.com/ 28. United States. White House Office, & Obama, B. (2011). International Strategy for Cyberspace: Prosperity, Security, and Openness in a Networked World. White House. 29. Kundra, V. (2011). Federal cloud computing strategy. 30. VanRoekel, S. (2011, December 8). MEMORANDUM FOR CHIEF INFORMATION OFFICERS. Retrieved July 13, 2016, from https://www.fedramp.gov/files/2015/03/fedrampmemo.pdf 31. Code, U. S. (1999). Gramm-Leach-Bliley Act. Gramm-Leach-Bliley Act/AHIMA, American Health Information Management Association. 32. What is Sensitive Data? Protecting Financial Information ... (2008). Retrieved June 19, 2016, from http://ist.mit.edu/sites/default/files/migration/topics/security/pamphlets/ protectingdata.pdf 33. Government Accountability Office (GAO) Report 08-343, Protecting Personally Identifiable Information, January 2008, http://www.gao.gov/new.items/d08343.pdf 34. (Wilshusen, G. C., & Powner, D. A. (2009). Cybersecurity: Continued efforts are needed to protect information systems from evolving threats (No. GAO- 10-230T). GOVERNMENT ACCOUNTABILITY OFFICE WASHINGTON DC.) 35. McCallister, E., Grance, T., & Scarfone, K. (2010, April). Guide to Protecting the Confidentiality of Personally ... Retrieved July 13, 2016, from http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf 36. Act, A. C. C. O. U. N. T. A. B. I. L. I. T. Y. (1996). Health insurance portability and accountability act of 1996. Public law, 104, 191.
  37. 37. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 37 of 46 37. Graham, C. M. (2010). HIPAA and HITECH Compliance: An Exploratory Study of Healthcare Facilities Ability to Protect Patient Health Information. Proceedings of the Northeast Business & Economics Association. 38. Anderson, H. (2010, February 8). The Essential Guide to HITECH Act. Retrieved June 19, 2016, from http://www.healthcareinfosecurity.com/essential-guide-to-hitech-act-a- 2053 39. Dimov, I. (2013, June 20). Guiding Principles in Information Security - InfoSec Resources. Retrieved July 09, 2016, from http://resources.infosecinstitute.com/guiding-principles-in-information- security/ 40. Amazon Web Services (AWS) - Cloud Computing Services. (n.d.). Retrieved July 10, 2016, from https://aws.amazon.com/ 41. EC2 Instance Types – Amazon Web Services (AWS). (2016). Retrieved July 10, 2016, from https://aws.amazon.com/ec2/instance-types/ 42. Regions and Availability Zones. (2016, January). Retrieved July 13, 2016, from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using- regions-availability-zones.html 43. Elastic IP Addresses. (2016). Retrieved July 10, 2016, from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip- addresses-eip.html 44. AWS | Elastic Load Balancing - Cloud Network Load Balancer. (2016). Retrieved July 10, 2016, from https://aws.amazon.com/elasticloadbalancing/ 45. AWS | Amazon Route 53 - Domain Name Server - DNS Service. (2016). Retrieved July 10, 2016, from https://aws.amazon.com/route53/ 46. SSL Security Solutions. (2016). Retrieved July 10, 2016, from http://www.networksolutions.com/SSL-certificates/index.jsp 47. What is the SSL Certificate Chain? (2016). Retrieved July 10, 2016, from https://support.dnsimple.com/articles/what-is-ssl-certificate-chain/ 48. Ellingwood, J. (2015, January 28). Apache vs Nginx: Practical Considerations | DigitalOcean. Retrieved July 10, 2016, from https://www.digitalocean.com/community/tutorials/apache-vs-nginx- practical-considerations
  38. 38. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 38 of 46 49. Node.js Introduction. (2016). Retrieved July 10, 2016, from http://www.tutorialspoint.com/nodejs/nodejs_introduction.htm 50. Webberly, W. (2016, May 23). Direct to S3 File Uploads in Node.js | Heroku Dev Center. Retrieved July 12, 2016, from https://devcenter.heroku.com/articles/s3-upload-node#summary 51. Compensation. (2013, October 22). Retrieved July 12, 2016, from http://www.benefits.va.gov/compensation/dbq_disabilityexams.asp
  39. 39. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 39 of 46 Source Code Listings App.js – this is the server side logic for the project: /* Cecil Thornhill 5/26/2016 Based on code examples and samples from Will Webberly and Amazon for S3 uploads */ /* In learning how to interface to S3 via Node JS and JavaScript I started with code from a tutorial provided by Dr. Will Webberly who was a computer science lecturer at Cardiff University and is now CTO at Simply Di Ideas. Will was kind enough to correspond with my and address questions on the concepts and use cases involved in my project. The original article I referenced is at: https://devcenter.heroku.com/articles/s3-upload-node#initial-setup */ /* This is the main logic for the server side of the proof of concept demo for my project. The code here supports the features required to allow the client to security load a file to the S3 storage site. The simple proof pages and this core logic do not attempt to implement any user authentication, authorization or administration of the site. Those funcitons are pre-selected via the structure of the users and groups built in the S3 interface for this demo. All these aspects would be expected in a more full featured site design, but are not required to establish the functional proof of concept for the main secure upload of files functionality. */ /* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */
  40. 40. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 40 of 46 /* * Import required packages. * Packages should be installed with "npm install". */ /* CT - I am using local variable for the development versions of this demo site. Below I requre dotenv to allow local config management, so this demo can runwithout setting envirionment variables on the server which is the more correct final operations configuration practice on a deployed systems to prevent exposing the values in the open production environment. Of course it is much easier to manage local values from this resource file in the development phase so that is the way I went for the the current demo code. */ var dotenv = require('dotenv'); dotenv.load(); /* To ensure that we got the values we expexted I also show the variables now in process.env - now with the values from the .env added on the console. Of course this is not something to do in the final production system. */ console.log(process.env) const express = require('express'); const aws = require('aws-sdk'); /* * Set-up and run the Express app. CT - note we are ruuning on port 3000 in this case. It is important to foraward your web traffic from the NGINX server to the proper port via setting up the reverse proxy configuration in the NGINX server, so that traffic gets through from the web server to the applicaiton server. */ const app = express(); app.set('views', './views'); app.use(express.static('./public')); app.engine('html', require('ejs').renderFile); app.listen(process.env.PORT || 3000); /* * Load the S3 information from the environment variables.

×