Your SlideShare is downloading. ×
A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007
A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007
A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007
A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007


Published on

A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007

A Quick Survey of Open Source Software for PH Organizations, a paper by Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. A Quick Survey of Open Source Software for PH OrganizationsBy Massimo Mirabito, MBA (US CDC) and Taha Kass-Hout, MD, MS, 2007Unstructured Text 1. Lucene: Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. This technology suitable for nearly any application that requires full-text search, especially cross-platform. Lucene itself is just an indexing and search library and does not contain crawling and HTML parsing functionality. The Apache project Nutch is based on Lucene and provides this functionality. Lucene provides capabilities to index a variety of document formats. 2. Solr: Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. Solr is a stand alone server which applications communicate with using XML and HTTP to index documents, or execute searches. Solr supports a rich schema specification that allows for a wide range of flexibility in dealing with different document fields, and has an extensive search plugin API for developing custom search behavior 3. Nutch: Nutch is an effort to build an open source search engine based on Lucene Java for the search and index component. The fetcher ("robot" or "web crawler") has been written from scratch solely for this project. Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering. As of June 2005, Nutch has graduated from the Apache Incubator, and is now a subproject of Lucene. It is coded completely in the Java programming language, but data is written in language-independent formats. In June 2003, there was a successful 100 million page demo system. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop. 4. UIMA: UIMA stands for Unstructured Information Management Architecture. It is a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and its integration with search technologies developed by IBM. The source code for a reference implementation of this framework has been made available on SourceForge, and later on Apache Software Foundation website. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence
  • 2. boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.Alternative GISGraphical Information System (GIS) is an equally critical component. GIS provides a way ofcapturing, storing, analyzing and managing data and associated attributes which are spatiallyreferenced to the earth. Additionally, for proper data analysis a time series should be supportedas it provides researchers, first responders and emergency personnel capabilities to view dataspatially and over time. The most prominent inexpensive tools one the internet are Google Map,Google Earth, Microsoft Live Earth and Yahoo Maps. All these tools are relatively easy to useconfigure and distribute. Google Earth and Goggle maps are the most prominent tools used byweb developer. The Keyhole Markup Language (KML) is XML based and used to describegeospatial data, KML can be used by Google Earth and Google Maps. 1. Open Layers ( OpenLayers provide capabilities to embed dynamic maps in any web page. It can display map tiles and markers loaded from a variety of sources. MetaCarta developed the initial version of OpenLayers and gave it to the public to further the use of geographic information of all kinds. OpenLayers is completely free, Open Source JavaScript, released under the BSD License. 2. MapServer ( MapServer is an open source development environment for building spatially-enabled internet applications. MapServer supports Open Geospatial Consortium (OGC) standards, including Web Map Service (WMS) and Web Feature Service (WFS). MapServer works with PostgreSQL and its PostGIS extension, and supports proprietary GIS formats including ESRIs Shapefile format. MapServer uses OGR and GDAL libraries to translate files from one file format to another. MapServer supports PHP, Python, Perl, Ruby, Java, and C# for scripting and customization. 3. GeoServer ( GeoServer is an Open Source server that connects information to the Geospatial Web including publishing and editing data using open standards. It is a fully functional geospatial web service implementing the WMS 1.1.1 and WFS 1.0 implementation specifications from OGC. Information is made available in a large variety of formats as maps/images or actual geospatial data. GeoServers transactional capabilities offer robust support for shared editing. GeoServers focus is ease of use and support for standards, in order to serve as glue for the geospatial web, connecting from legacy databases to many diverse clients.
  • 3. 4. GeoTools ( Geo Tools is an open source (LGPL) Java code library which provides standards compliant methods for the manipulation of geospatial data, for example to implement Geographic Information Systems (GIS) . The Geo Tools library implements Open Geospatial Consortium (OGC) specifications as they are developed, in close collaboration with the GeoAPI and GeoWidgets projects.Enterprise Services Bus (ESB)Application integration is one of the most challenging aspects when building a platform. An ESBis middleware infrastructure that connects multiple systems via standard protocols, exposesservices for consummation, provides messaging capabilities, transformation, routing, as well asleverage existing IT assets. There are several open source ESB products 1. ServiceMix: ServiceMix is an Open Source ESB combining functionality of a Service Oriented Architecture (SOA) and an Event Driven Architecture (EDA) to create an agile, enterprise ESB. Apache ServiceMix is an open source distributed ESB built from the ground up on the Java Business Integration (JBI) specification JSR 208 and released under the Apache license. The goal of JBI is to allow components and services to be integrated in a vendor independent way, allowing users and vendors to plug and play. ServiceMix is lightweight and easily embeddable, has integrated Spring support and can be run at the edge of the network (inside a client or server), as a standalone ESB provider or as a service within another ESB. 2. Mule: Mule is a light-weight messaging framework. It is a highly distributable object broker that can seamlessly handle interactions with other applications using disparate technologies, transports and protocols. The Mule framework provides a highly scalable environment in which you can deploy your business components. Mule manages all the interactions between components transparently whether they exist in the same VM or over the internet and regardless of the underlying transport used. The common scenario for using Mule include Integration projects where two or more existing systems need to communicate with each other. Applications that need to be totally decoupled from their surrounding environment or where the ability to scale one more components in the system is needed. 3. FUSE ESB: Fuse ESB is an Open source product based on Apache ServiceMix odder by IONA Technologies. FUSE ESB provides a standardized methodology, server, and tools to deploy integration components, freeing architects from the dependencies that have traditionally locked enterprises into proprietary middleware stacks. FUSE ESB enables organizations to achieve their service-oriented architecture (SOA) objectives with a proven open source solution for enterprise integration.
  • 4. ScalabilityScalability is important when deploying solutions that need to perform adequately during highvolume. Scalability is the ability to ensure availability, reliability, and performance based on theamount of concurrent connections, load as they progressively increase. Scalability can be definedas follows: • Scale vertically: To scale vertically (or scale up) implies adding resources to a single server, typically involving the addition of CPUs or memory. This could also mean expanding the number of running processes. • Scale horizontally: To scale horizontally (or scale out) means to add more servers to a system, such as adding a new computer to a distributed software application. An example might be scaling out from 1 web server to 3.The following products can deliver high availability and clustered solutions: 1. Open Terracotta: Open Terracotta is Open Source JVM-level clustering software for Java, delivering clustering as a runtime infrastructure service, simplifying the task of clustering a Java application. The capability is provided by clustering the JVM underneath the application, instead of clustering the application itself. 2. GridGain: GridGain is a computational grid framework. Its goal is to improve general performance of processing intensive applications by splitting and parallelizing the workload. In many cases GridGain is used to achieve better overall throughput, better scalability or availability of services. GridGain supports out-of-the-box the follwign: JBoss, Spring, Spring AOP, JBoss AOP, AspectJ, JGroups, Weblogic, Websphere, Oracle Coherence, Mule, JXInsight, and GigaSpaces.