The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
1. 1. Introduction
1.1. Purpose
1.2. Scope
These practice guidelines are for those who manage big-data and big-data analytics
projects or are responsible for the use of data analytics solutions. They are also intended
for business leaders and program leaders that are responsible for developing agency
capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may
assist strategic planners, business teams and data analysts to consider the value of big
data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can
work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to
join the Data Analytics Centre of Excellence Community of Practice to share information
of technical aspects of big data and big data analytics, including achieving best practice
with modeling and related requirements. To join the community, send an email to the
Data Analytics Centre of Excellence.
1.3. Definitions, acronyms & abbreviations
What is Big Data?
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data
in the world today has been created in the last two years alone.
Gartner defines Big Data as high volume, velocity and variety information assets
that demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.
According to IBM, 80% of data captured today is unstructured, from sensors used
to gather climate information, posts to social media sites, digital pictures and
videos, purchase transaction records, and cell phone GPS signals, to name a few.
All of this unstructured data is Big Data.
What does Hadoop solve?
Organizations are discovering that important predictions can be made by sorting
through and analyzing Big Data.
However, since 80% of this data is “unstructured”, it must be formatted (or
structured) in a way that makes it suitable for data mining and subsequent
analysis.
Hadoop is the core platform for structuring Big Data, and solves the problem of
making it useful for analytics purposes.
2. Where does Zettaset fit?
Zettaset makes you the expert at navigating the complexities of Hadoop.
Zettaset has created Orchestrator™, an enterprise management software solution
that addresses the common issues of Hadoop deployment with sophisticated and
easy-to-use interfaces and tools.
Orchestrator software eliminates the hassle of installing and managing Hadoop,
while giving you greater control over your Big Data environment with enterprise-
class features that address security, high availability, and performance.
1.3. References
[1] R. Ahmed and G. Karypis, “Algorithms for Mining the Evolution of Conserved
Relational States in Dynamic Networks,” Knowledge and Information Systems, vol. 33,
no. 3, pp. 603-630, Dec. 2012.
[2] M.H. Alam, J.W. Ha, and S.K. Lee, “Novel Approaches to Crawling Important Pages
Early,” Knowledge and Information Systems, vol. 33, no. 3, pp 707-734, Dec. 2012.
[3] S. Aral and D. Walker, “Identifying Influential and Susceptible Members of Social
Networks,” Science, vol. 337, pp. 337-341, 2012. [4] A. Machanavajjhala and J.P. Reiter,
“Big Privacy: Protecting Confidentiality in Big Data,” ACM Crossroads, vol. 19, no. 1,
pp. 20-23, 2012.
[5] S. Banerjee and N. Agarwal, “Analyzing Collective Behavior from Blogs Using
Swarm Intelligence,” Knowledge and Information Systems, vol. 33, no. 3, pp. 523-547,
Dec. 2012.
[6] E. Birney, “The Making of ENCODE: Lessons for Big-Data Projects,” Nature, vol.
489, pp. 49-51, 2012.
[7] J. Bollen, H. Mao, and X. Zeng, “Twitter Mood Predicts the Stock Market,” J.
Computational Science, vol. 2, no. 1, pp. 1-8, 2011.
[8] S. Borgatti, A. Mehra, D. Brass, and G. Labianca, “Network Analysis in the Social
Sciences,” Science, vol. 323, pp. 892-895, 2009.
[9] J. Bughin, M. Chui, and J. Manyika, Clouds, Big Data, and Smart Assets: Ten Tech-
Enabled Business Trends to Watch. McKinSey Quarterly, 2010.
[10] D. Centola, “The Spread of Behavior in an Online Social Network Experiment,”
Science, vol. 329, pp. 1194-1197, 2010.
1.5. Overview
Hadoop is 100% open source, and pioneered a fundamentally new way of
storing and processing data. Instead of relying on expensive, proprietary hardware and
different systems to store and process data, Hadoop enables distributed parallel
processing of huge amounts of data across inexpensive, industry-standard servers that
both store and process the data, and can scale without limits. With Hadoop, no data is too
big. And in today’s hyper-connected world where more and more data is being created
every day, Hadoop’s breakthrough advantages mean that businesses and organizations
can now find value in data that was recently considered useless. Big Data concern large-
3. volume, complex, growing data sets with multiple, autonomous sources. With the fast
development of networking, data storage, and the data collection capacity, Big Data are
now rapidly expanding in all science and engineering domains, including physical,
biological and biomedical sciences. This paper presents a HACE theorem that
characterizes the features of the Big Data revolution, and proposes a Big Data processing
model, from the data mining perspective. This data-driven model involves demand-driven
aggregation of information sources, mining and analysis, user interest modeling, and
security and privacy considerations. We analyze the challenging issues in the data-driven
model and also in the Big Data revolution.
2. Overall description
2.1. Product perspective
2.1.1. System interfaces
Windows XP Operating System.
Windows 7 operating system
2.1.2. User interfaces
We make the user interface using Jsp and Servlet
We follow the 3-tier architecture like DLL, BLL and UI.
Behavioral pattern for design standard
2.1.3. Hardware interfaces
Processor:- Intel Pentium 4 or above
Memory:- 512 MB or above
Other peripheral:- Printer
Hard Disk:- 10gb
2.1.4. Software interfaces
4. Technologies and tools used in Policy system project are as follows
Technology used:
Front End
Jdk 1.6.0
Netbeans 6.9.1
Internet Explorer 6.0/above
Back-End
Mysql 5.1
2.1.5. Communications interfaces
We use TCP/IP protocol for establishing connectionand transmitting data over the
network. We use Ethernet for LAN
2.1.6. Memory constraints
Basically initial software’s will use around 20 GB on hard drive, and the our actual
application will be around 300 MB data. When we deploy the application on web
server we will assume the 1 GB space for website and 500 MB for database.
2.1.7. Operations
When we boot our computer, but do you know what that means? Booting is the process
that occurs when you press the power button to turn our computer on. During this process
(which may take a minute or two), the computer does several things:
It runs tests to make sure everything is working correctly.
It checks for new hardware.
It then starts up the operating system.
5. 2.1.8. Site adaptation requirements
1 domain name required
1 hosting plan required on web server
Database space required
May be hadoop server required base on load balancing
2.2. Product functions
The final products having some functions these are below
The APS Big Data Strategy highlighted the opportunities and benefits of big data more
generally and identified case studies where big data is already being used to benefit by
government agencies . The opportunities include the chance to improve and transform
service delivery, enhance and inform policy development, supplement and enrich official
statistics, provide business and economic opportunities, build skills in a key knowledge
sector, and derive productivity benefits.
Big data is likely to have application in all government agencies now and into the future.
Government agencies will need to consider the extent to which they for the potential to
benefit from using big data and big data analytics and whether they need to build a
capability to do so. Developing a big data capability requires significant commitment of
resources and accompanying shifts in processes, culture and skills. Outside of the usual
factors such as cost and return on investment, the decision to develop a Big Data
capability needs to take into account several factors:
1. Alignment of a big data capability with strategic objectives - consider the extent
to which the agency’s strategic objectives would be supported by a big data capability
across the range of activities and over time.
2. The business model of the agency now and into the foreseeable future – consider
the extent to which the current business model supports and would be supported by big
data capability.
3. Current and future data availability – the extent and range of data sources
available to the agency now, and the potential data sources, their cost, and barriers to
access.
4. Maturity of the available technology and capability– consideration needs to be
given to the extent to which current big data technology and capability can deliver the
intended benefits, gather examples of what has been delivered and the practical
experience of that implementation.
5. Likelihood of accruing benefits during the development of the capability –
consideration needs to be given to whether there is an achievable pathway for developing
a big data capability, the ability to take a stepwise approach and expand the solution
across more aspects of agency activity as the technology is proven.
6. 6. Availability of skilled personnel to manage big data acquisition and analysis and
the organisational environment to support the development of the technology, people and
process capability required to use big data.
Once the strategic need for a big data analytics capability has been identified, it is
recommended that there is a program of big data projects that is prioritised and
implemented.
Architecture
2.3. User characteristics
Goal
To design products that satisfy their target users, a deeper understanding is needed of
their user characteristics and product properties in development related to unexpected
problems users face. These user characteristics encompass cognitive aspect, personality,
demographics, and use behavior. The product properties represent operational
transparency, interaction density, product importance, frequency of use and so on. This
7. study focuses on how user characteristics and product properties can influence whether
soft usability problems occur, and if so, which types. The study will lead to an interaction
model that provides an overview of the interaction between user characteristics, product
properties, and soft usability problems.
Method and results
In total three surveys and one experiment were conducted. The first survey was a
questionnaire survey to explore what usability problems users experienced in the
Netherlands and South Korea. This study resulted in the categorization of soft usability
problems. The second survey investigated how user characteristics are related to the
occurrence of specific soft usability problems. Finally, an experiment was conducted to
find out how user characteristics are correlated to specific soft usability problems
depending on type of product in the USA, South Korea and the Netherlands. Based on the
findings from the studies, an interaction model (PIP model: Product-Interaction-Persona
model) were developed which provides insight into the interaction between user
characteristics, product properties, and soft usability problems. Based on this PIP model a
workshop and an interactive tool were developed. Companies can use the PIP model to
gain insights into probable usability problems of a product they are developing and the
characteristics of those who would have problems using the product.
Validation
The PIP model was validated in the companies involved in the project to see how it is
used in the product development process and what should be improved. The validation
also included workshops in which designers in the companies could experience and learn
how the findings and the model and the tool are applicable to their design process.
Card set
A card set provides the definitions of three categories of soft usability problems with
examples in actual use and retrospective evaluation, and can be found on the Results
page.
8. Design and Implementation constraints
This protocol is implemented in Java language .We also use HTTP/TCP/IP protocols.
Java has had a profound effect on the Internet. The reason for this is Java expands the
universe of objects that can be about freely on the Internet. There are two types of objects we
transmit over the network, passive and dynamic.
Network programs also present serious problems in the areas of security and portability. When
we download a normal program we risk viral infection. Java provides a firewall to overcome
these problems. Java concerns about these problems by applets. By using a Java compatible Web
browser we can download Java applets without fear of viral infection.
Assumptions and Dependencies
We assume that there are several servers and clients attached to it.
User system supports TCP/IP protocols.
The key considerations of Java are
1. Object Oriented: Java purist’s “everything is an object” paradigm. The java object model is
simple and easy to extend.
2. Multithreaded: Java supports multithreaded programming which allows you to write
programs that do many things simultaneously.
3. Architecture-Neutral: the main problem facing programmers is that no guarantee that if
you write a program today, it will run tomorrow, even on the same machine. Java language
and JVM solves this problem, their goal is to provide “Write once; Run anywhere anytime
forever.
4. Distributed: Java is designed for the distributed environment of the Internet because it
handles TCP/IP protocols.
9. The minimum requirements the client should have to establish a connection to a server are as
follows:
Processor: Pentium III
Ram : 128MB
Hard Disk: 2 GB
Web server: Java Web Server
Protocols: TCP/IP
External Interface Requirements
User Interfaces
This includes GUI standards, error messages for invalid inputs by users, standard buttons
and functions that will appear on the screen.
Hardware Interfaces
We use TCP/IP protocol for establishing connection and transmitting data over the
network. We use Ethernet for LAN.
Software Interfaces
We use Oracle for storing database of clients who connects to the server through JDBC &
ODBC.
Security Requirements
10. We provide authentication and authorization by passwords for each level of access.
We implement IDEA algorithm for secure data transmission.
Software Quality Attributes
Product is adaptable to any changes.
Such as the product can be modified to transfer not only text but also image, audio,
video files
Product is reliable due to the file encryption and authentication.
That means the data is not lost or goes into wrong hands.
Product is portable; it can run between only two connected systems or a large Network of
computers.
Product is maintainable; i.e. in future the properties of the product can be changed to
meet the requirements.
.
2.6. Apportioning of requirements
Customer experience strategy: Leverages key insights from digital agency IBM
Interactive to help provide an enhanced multichannel customer experience, user
experience design and full life-cycle development
Existing web experience enhancement: Approaches to better leverage content
management, portals, product catalogs and user experience
Smarter sales and marketing: Techniques from the WebSphere Commerce
development lab and service support to help provide deep integration skills and
faster-to-market deployment
Time to value: Services leveraging prebuilt assets, global talent pools and
accurate estimating tools and techniques to help you get to market faster
11. 3. Specific Requirements
3.1 External interface requirements
3.1.1 User interfaces
There would be efficient user interfaces. There would be a proper
provision for the user to input the data. The user can view the result of the
classification, error-rates, and labeled-accuracy.
3.1.2 Hardware interfaces
Processor – Pentium P4 or higher version
RAM – 1GB or more
Hard Disk – 40GB or more
3.1.3 Software interfaces
Dataset–Mysql 5.1
Operating System –Windows XP SP2 or higher
Other Softwares- Relational Database, JDK1.6 or higher version
3.1.4 Communication interfaces
We use TCP/IP protocol for establishing connection and transmitting data over the
network. We use Ethernet for LAN
3.2 Specific requirements
3.2.1 Sequence diagrams
// Add diagram here
3.2.2 Classes for classification of specific requirements
3.3 Performance requirements
12. The only way in which systems will meet their performance targets is for them to be
specified clearly and unambiguously. It is a simple fact that if performance is not a stated
criterion of the system requirements then the system designers will generally not consider
performance issues. While loose or incorrectly defined performance specifications can
lead to disputes between clients and suppliers. In many cases performance requirements
are never ridged as system that does not fully meet its defined performance requirements
may still be released as other consideration such as time to market.
In order to assess the performance of a system the following must be clearly specified:
• Response Time
• Workload
• Scalability
• Platform
3.4 Design constraints
Before you start drafting your design considerations, you must know the outcomes of
your design opportunity. Some design opportunities are inclined to an improvement of
certain functions or safety. Some is targeted at a specific target audience. Some aims
to solve a bugging problem that seems to have no viable solutions at the present. Some
are proposed to create fun while using.
Knowing precisely what you want out of your design proposal helps a lot in drafting a
good set of design considerations. Because that will mean you will be very spot on in
identifying the areas of considerations. Otherwise yours will simplistically be stating the
obvious universal areas like, products should be safe for users and must look good, must
be colorful, etc.
3.5 Software system attributes
3.5.1 Reliability
Software Reliability is the probability of failure-free software operation for a
specified period of time in a specified environment. Software Reliability is also an
13. important factor affecting system reliability. It differs from hardware reliability in
that it reflects the design perfection, rather than manufacturing perfection. The high
complexity of software is the major contributing factor of Software Reliability
problems. Software Reliability is not a function of time - although researchers have
come up with models relating the two. The modeling technique for Software
Reliability is reaching its prosperity, but before using the technique, we must
carefully select the appropriate model that can best suit our case. Measurement in
software is still in its infancy. No good quantitative methods have been developed
to represent Software Reliability without excessive limitations. Various approaches
can be used to improve the reliability of software, however, it is hard to balance
development time and budget with software reliability.
3.5.2 Availability
Over-engineering, which is designing systems to specifications better than
minimum requirements.
Duplication, which is extensive use of redundant systems and components.
Recoverability, which is the use of fault-tolerant engineering methods.
Automatic updating, which is keeps OSs and applications current without user
intervention.
Data backup , which prevents catastrophic loss of critical information.
Data archiving , which keeps extensive records of data in case of audits or other
recovery needs.
Power-on replacement, which is the ability to hot swap components or
peripherals.
The use of virtual machine s, which minimizes the impact of OS or software
faults.
Use of surge suppressor s, which minimizes risk of component damage resulting
from power-line anomalies.
14. Continuous power, which is the use of an uninterruptible power supply keeps
systems operational while switching from commercial power to backup or
auxiliary power.
Backup power sources, which includes batteries and generators to keep systems
operational during extended interruptions in commercial power.
3.5.3 Security
When the security functionality in a proposed product does not satisfy specific security
requirements then the risk introduced must be evaluated and additional controls must be
reconsidered prior to purchasing the product. Where additional functionality is supplied
and causes a security risk, this must be disabled or the proposed control structure must be
reviewed to determine if advantage can be taken of the available enhanced functionality.
Design reviews must be conducted at periodic intervals during the development process
to assure that the proposed design will satisfy the functional and security requirements
specified by the owner.
applying the security requirements to the project and allocating financial,
technical and human resources as required for meeting the security requirements
of the project
ensuring that the security controls are tested and validated during acceptance test
phase
maintaining the security controls throughout the life cycle of the product or the
application
Product or service specifications must include the requirements for security
controls. Contracts with the Providers must also address the identified security
requirements.
3.5.4 Maintainability
15. The following steps should be undertaken to assess maintainability statically:
A list of maintainability factors to be included in the assessment should be
devised e.g. structure, complexity.
Each factor (or group of factors) should be assigned a weighting to indicate
its importance to the overall maintainability of the system. Each factor will
have a maximum score of 10. The higher the score the less maintainable the
system.
During the assessment a score is awarded against each factor on the list. For
example, a relatively old system may be awarded a score of 8 out of 10 to
indicate that due to its age the system will relatively difficult to maintain.
The scores for each of the factors assessed are then multiplied by the
appropriate weighting and the resultant products are then summed to give an
overall score which forms the Maintainability Measure of the system (the
lower the score, the better the maintainability of the software system).
Example factors which can be used in a maintainability assessment are given
below; the list is not exhaustive and should be modified to suit an individual
organization (although it is helpful if the same list is used throughout the
organization so comparisons between systems can be made):
1: Size Maintainers’ perception
2: Complexity Environmental facilities
3: Structure Maintenance relationships
4: Development process System users/customers
5: Documentation Maintenance team
6: Development team Test facilities
7: Development timescale Operating procedures
8: Maintenance procedures Problem change traffic
9: Development relationships Business change traffic
3.6 Other requirements
16. Non Functional Requirement
Accessibility
Capacity, current and forecast
Compliance
Documentation
Disaster recovery
Efficiency
Effectiveness
Extensibility
Fault tolerance
Interoperability
Maintainability
Privacy
Portability
Quality
Reliability
Resilience
Response time
Robustness
Scalability
Security
Stability
Supportability
Testability
Functional Requirement
System must be fast and efficient
User friendly GUI
Reusability
Performance
17. System Validation input
Proper output
4. Supporting information
4.1 Table of contents and index
4.2 Appendixes
18. SET THEORY
Sr
No
.
Description Observation
s.
1
Problem Description
Let S be a system which do analysis and read documents; such that S
= {S1, S2, S3, S4} where S1 represents Image Encryption &
selection Module. S2 represents Data Embedding Module. S3
represents Data Extraction & Image Recovery Module.S4 represents
Result Module
S holds list of
modules in
the system
2 Activities
2.1 Activity I
Image Encryption & selection
Let S1 be a set of parameters for Selecting Image
S1= {Image_Size ,Image_Quality}
Where,
Image_Size=Actual size of Image
Image_Quality=Quality of Image
Condition/Parameter Operation/Function
If Image_Quality==Good f1:Proceed()
Else.. Discard image
If Image
Quality is
valid then
proceed
Else discard
the image
19. 2.2 Activity II
Data Embedding Module
Lets S2 be a set of Embedded data
S2={Data_Size}
Where
Condition/Parameters Operation/Function
If(Data_Size>= KB) F2:Proceed()
Else Data not accepted
If Data size is
less than or
equal to
KB(Kilo
Bytes) then
only proceed
Else data is
not accepted
& embedded
2.3
Activity III
Data Extraction & Image Recovery Module
Let S3 be the set of parameters to data Extraction & image recovery
S3:{Image_Encry_Key,)
Where,
Image_Encry_Key = Encryption key generated during image
Encryption phase
= generated during data
Hiding phase
Condition/Parameter Operation/Function
If Image_Encry_
Key==valid key
F3:Proceed()
Else Image not recovered
IfData_Hiding_
key==valid key
F4:Proceed()
Else Data not recovered
If
Image_Encry
_Key &
image_Hidin
g_key is
valid than
further
proceed
Else image is
not recovered
&data is not
Extracted
.
20. 3. Venn Diagram
As described above in entire Process:
Input (Encrypted image & data) Output( Decrypted image & data
Bibliography
Servlet Programming by O`Relly
Java ComplteReference 2.
Computer Networks, Third Edition - Andrew S. Tanenbaum
Modern Operating Systems - Andrew S. Tanenbaum
Software Engineering A Practitioner's Approach - Roger S. Pressman