Vigneshram murugan cloud project_documentation

News Aggregator Web Application on PaaS and IaaS
Vigneshram Murugan (murug1v), Shreya Reddy Muddasani (mudda1s)
Part I: Abstract
Cloud environment provides a never before opportunities to the consumer who is willing to initiate
their own startup company. This documentation discusses about the development and deployment
of two web application which parse various news sources such as NY Times, Buzz Feed and few
other sites to collect latest Headlines and Trending news using their website APIs. It then
aggregates and store all the news in its datastore / database and post the headlines in the web
application site. we have implemented our application on both Platform as a Service – Google App
Engine and Infrastructure as a Service – Amazon Web Service EC2 instance. By implementing on
two platforms we learned the pros and cons of both.
Part II: Implementation
News API:
These applications use RESTful API of news sites to retires the data in JSON format. The URL to
retrieve information consists of two parts. 1. The address part, 2. API key part.
https://api.nytimes.com/svc/topstories/v2/home.json?apikey=fc8222667db44088805d9d95dfc9c
06e
Generally, we should request an api key from the news provider to access their API.
Technologies Used:
The first application is developed on Eclipse 4.6.0 with Google plugin in Java
programming language. It uses Google’s NoSQL Datastore to save all the application data. It also
uses features like Task Queue, Memcache and Cron to enhance the application. The main reason
to use Google App Engine is the application can scale very easily depending upon the traffic. It
has a built-in Load Balancer which can handle any level of load. Memcache provided by google
is a distributed, in-memory data cache. Using Memcache will improve the performance while

retrieving data. This application uses Task Queue to load news data and image in to the Datastore,
which will be done as a small discrete task. For this project purpose, we have utilized 60 days trial
usage feature of google cloud services to deploy.
The Second application is developed on Eclipse 4.6.0 with Spring plugin in Java Spring
Boot Framework. The data of the application is stores in a database. The reason why we have
used spring Boot is, it provides Dependency injection between the class objects. This feature
make the application more loosely coupled between each class. This feature is mandatory if we
are deploying the application in cloud container as a microservice. Also, it provides many
libraries for easy handling of Json data and restful services. This application also uses Java
Persistence API for object- relational mapping. It will manage the relational data in the database.
This feature will automatically create tables for each class in the database schema. We have also
implemented cache while storing and retrieving data in this application. It is like Memcache. We
have deployed this application in Amazon Web Service Using EC2 Ubuntu instance. This
Infrastructure as a service is also scalable than the traditional LAMP stack server. It is very
inexpensive and take only few minutes to start a server. It has a feature of elastic scale
computing to scale the computing power according to demand. The AWS EC2 comes with many
tool and monitoring applications. For the project purpose, we have used t2 micro instance. We
can also implement this application very easily on AWS EC2 container service as a container.
Part III: Design:
News Aggregator on PaaS:
This application consists of three servlets.
1. Newsaggregator1Servlet 2.JsonFetcher 3. memcached
Initially when the program runs, the JsonFecher.java which has the URL and API key for news
site fetches the JSON of New York Times News Site. It is stored in the Buffer Reader. We use
GSON library to parse this data. Json Parser parses it and store all data as Json elements. This
Json elements are converted into Json Objects to easy retrieval. Further the objects are converted
it to Json Array if required. Now we manually retrieve the required data by specifying its object
name, stored in string.

String strURL=
"https://api.nytimes.com/svc/topstories/v2/home.json?apikey=fc8222667db44088805d9d95dfc9c
06e";
URL url = new URL(strURL);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
JsonParser pars= new JsonParser();
JsonElement je = pars.parse(reader);
jo= je.getAsJsonObject();
ja= jo.getAsJsonArray("results");
jo1 =ja.get(0).getAsJsonObject();
String a1s=jo1.get("section").toString();
String a1t=jo1.get("title").toString();
String a1p=jo1.get("published_date").toString();
String a1a=jo1.get("abstract").toString();
String a1u=jo1.get("url").toString();
These strings are converted into arrays and fed into Task Queue class UpLoadImage().
queue.add(TaskOptions.Builder.withPayload(new
UpLoadImage(Article9[0],Article9[1],Article9[2],Article9[3],Article9[4])).etaMillis(System.cur
rentTimeMillis() + DELAY_MS));
This class is scheduled to perform each call with a delay of 5 seconds. Once it is called, it
retrieves the passed parameters and create an instance to access the Google data store
DatastoreService ds1 = DatastoreServiceFactory.getDatastoreService();
Now Parent Entity ArticlesDB is created with key vigneshram and child entity
ArticlesphotoDB is also created. All the data is loaded into ArticlesphotoDB using setproperty ()
and put()
Entity article = new Entity("ArticlesDB", "vigneshram");
Entity photo = new Entity("ArticlesphotoDB", article.getKey());
photo.setProperty("section",sec);

photo.setProperty("title", titl);
photo.setProperty("published_date", publ);
photo.setProperty("abstract", abst );
photo.setProperty("url", ur);
photo.setProperty("timestamp", new Date());
ds1.put(photo);
This servlet is called every two minutes to update news from news api using cron job. It is
specified in cron.xml file
<?xml version="1.0" encoding="UTF-8"?>
<cronentries>
<cron>
<url>/jfcron</url>
<target>beta</target>
<description> testing cron</description>
<schedule>every 2 minutes </schedule>
<retry-parameters>
<min-backoff-seconds>2.5</min-backoff-seconds>
<max-doublings>5</max-doublings>
</retry-parameters>
</cron>
</cronentries>
Newsaggregator1servlet.java is used to retrieve the stored data from the Datastore using
query. The results are limited to 10.
Query q1 = new Query("ArticlesphotoDB");
q1.addSort("timestamp", Query.SortDirection.DESCENDING);
PreparedQuery pq1 = ds.prepare(q1);
QueryResultList<Entity> result = pq1.asQueryResultList(FetchOptions.Builder.withLimit(10));
This page also has a page count which whose counter variable is store in Memcache.

Finally, the memcached.java servlet retrieves the datastore entries and caches it. The cache is
accessed to convert the data in to Json format to feed it into UI or for future applications/
interface. This page also contains memcache based page counter.
Query q1 = new Query("ArticlesphotoDB");
q1.addSort("timestamp", Query.SortDirection.DESCENDING);
PreparedQuery pq1 = ds.prepare(q1);
QueryResultList<Entity> result =
pq1.asQueryResultList(FetchOptions.Builder.withLimit(10));Key k =ey.getParent();
MemcacheService cache = MemcacheServiceFactory.getMemcacheService();
cache.put(k,result);
Gson gson = new Gson();
String data = gson.toJson(cache.get(k));
response.setContentType("application/json");
response.getWriter().println(data);
News Aggregator on IaaS: This application consists of application.Property file which has
config data for JPA and news API urls & its keys. Once the Application runs,
NewsAggregatorApplication.java initiates the FeedController.java through
@SpringBootApplication annotation. WebMvcConfig.java set the /home URL to index.html.
registry.addViewController("/home").setViewName("home/index");
The FeedControler file consists of Restful templates which access the json and store its value in
buzzfeed class objects, which intern get mapped into buzzfeed database through JPA/Hiberhate
feature. Buzzfeedrepository will forward all the data to the user interface as a list. Similarly
Newsapi gets updated. This function is called every 2 hours using the cron job
@RequestMapping(value="/get-buzz-feed", method=RequestMethod.GET)
public BuzzFeed getNewsFromBuzzFeed() throws Exception {
RestTemplate buzzFeedTemplate = new RestTemplate();
BuzzFeed buzzFeed = buzzFeedTemplate.getForObject(buzzFeedUrl,
BuzzFeed.class);
saveBuzzFeedData(buzzFeed);
return buzzFeed;

}
public void saveBuzzFeedData(BuzzFeed buzzFeed) throws Exception {
buzzFeedRepository.save(buzzFeed);
}
@Scheduled(cron = "0 0 0/2 * * ?")
public void pullNewsFeedFromBuzzFeed() throws Exception {
BuzzFeed buzzFeed = getNewsFromBuzzFeed();
}
All the com.news.domain and com.news.repository classes maps the Json values to their
appropriate table using JPA
Part IV: Deployment:
Deployment in Google App Engine: This application is deployed in Google App Engine. It was
initially developed in java 1.8, but it is converted in to 1.7 before deployment.
A look into Google App Engine’s :
DataStore:

Cron-job log:
Link to the Google App Engine Site: http://1-dot-newsaggregator-
151603.appspot.com/newsaggregator
Deployment in AWS:
This application is implemented in AWS Elastic Computing Cloud on Ubuntu instance. Initially
once the instance is created, Tomcat server and jdk is installed on the instance. The WAR file is
pushed to Webapps file in tomcat and the server is started. Using sh startup.sh. In this way any
traditional application can be deployed in cloud IaaS.

The console of AWS EC2 looks like :
The Instance monitoring console:
AWS URL to the deployed application:
http://ec2-35-165-238-110.us-west-2.compute.amazonaws.com:8080/news-crawl/home

Part V: File Link:
Google App Engine - News Aggregator:
https://drive.google.com/open?id=0B7oMKXmmPVhFekJ1cVZFaG14RU0
Aws - News Aggregator :
https://drive.google.com/open?id=0B7oMKXmmPVhFakxQY1NieFdhcVE
Part VI: Running on local Machine:
To run Google app Engine code on local host, install eclipse and plugin for google and Run as
Web Application.
To run AWS based application on local host, install eclipse and spring framework plugin and
Run as Java Application also install Mysql workbench and create a schema in the name “news”.
Part VII: Future Implementation:
We can deploy the Spring based application inside a container easily.
Part VIII: Tutorial:
https://www.youtube.com/watch?v=KxlCnYLOjSQ
Part IX: Conclusion:
Implementing a web application on IaaS and PaaS has its own pros and cons. PaaS provides
more scalability whereas IaaS is more traditional, easy to build and supports many external
frameworks. Implementing Datastore easy but traditional database will not scale properly. We
can choose whichever deployment model based on our demand.
Part X: References:
https://cloud.google.com/appengine/
https://docs.oracle.com/javaee/6/tutorial/doc/bnbpz.html
https://aws.amazon.com/ec2/
https://aws.amazon.com/documentation/ec2/

https://github.com/GoogleCloudPlatform/java-docs-
samples/blob/master/taskqueue/deferred/src/main/java/com/google/cloud/taskqueue/samples/Def
erSampleServlet.java
https://docs.spring.io/spring-boot/docs/current/reference/html/
http://docs.spring.io/spring/docs/5.0.0.M2/spring-framework-reference/htmlsingle/#mvc-servlet
http://stackoverflow.com/questions/16329657/tomcat-deploy-to-remote-server-with-war-file-as-
url
https://cloud.google.com/appengine/docs/java/
http://stackoverflow.com/search?q=google+app+engine
https://api.nytimes.com/svc/topstories/v2/home.json?apikey=fc8222667db44088805d9d95dfc9c
06e
http://developer.nytimes.com/top_stories_v2.json
https://sites.google.com/site/gson/gson-user-guide

Vigneshram murugan cloud project_documentation

Recommended

Recommended

More Related Content

Similar to Vigneshram murugan cloud project_documentation

Similar to Vigneshram murugan cloud project_documentation (20)

Recently uploaded

Recently uploaded (20)

Vigneshram murugan cloud project_documentation