Analysis of Economic Data Using Bigdata
Presented By
SHIVUMANJESH P
[4JC13MCA51]
VI SEM MCA
SJCE
Internal Guide
C J HARSHITHA
Assistant Professor
Dept. Of MCA
SJCE
External Guide
Imran basha
Senior Consultant
Snipe IT Solutions
JSS MAHAVIDYAPEETHA
SRI JAYACHAMARAJENDRA COLLEGE OF ENGINEERING MYSURU-570006
AN AUTONOMOUS INSTITUTE AFFILIATED TO
VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAVI.
Presentation on
Problem Definition
1. Inflation is rising as a serious threat for countries
development.
2. Unscientific farming
3. Big-picture problem, economic indicators and
decision makers rely on the native economic
transactions and on the data records.
Objective
 To examine economic data and recording the
increasing and decreasing vegetables and food items
prices year to year.
 Preferring the fresh and edible food products and to
overcome various problems of deficiency and
malnutrition.
 To maintain the continuous connectivity between
Demand-Supply Chain
Scope of The Project
• The Economic data analysis make an immense impact on E-
commerce and also builds a potential to the business activities
and also in the investments
• The analysis is limited to the particular products and can be
future extended based on the requirements and developments.
• The big data analysis can be presented using the android
application by providing simple and smart user interfaces about
products they use in the daily life
• It requires high end specification of the system on which it is
implementing, dealing with large data set with diversified
features and functionalities.
User characteristics
• The system will provide a very precise and simple platform
to the respective users.
• The admin will provide the access to the developer as well
as to the user and provides data sets.
• The developer collects the data sets clusters the data
based on the particular criteria and analyze the behavior
of the data elements.
• The user gets the desired result by firing a query.
General constraints
• The big data usage is efficient for large data sets and it is
not suitable for data with less volume.
• Since the main objective is based on data analysis user
interface section is given least priority.
• Sometimes it may find tedious to deal with complete
unstructured data items.
• The data which is obtained from the various source may
not be of same parameters
Functional Requirements
 Storage
• Hadoop Distributed File System is designed for storing very large
files with streaming data access patterns, running on clusters of
commodity hardware.
• The economic data is a highly diversified data set which is both large
and variety in nature.
• A dataset is typically generated or copied from source, and then
various analyses are performed on that dataset over time.
• Applications that require low-latency access to data, in the tens of
milliseconds range, will not work well with HDFS.
 Computation
• MapReduce is a processing technique that allows for
massive scalability across hundreds or thousands of
servers in a Hadoop cluster.
• The MapReduce algorithm contains two important tasks,
namely Map and Reduce.
• This algorithm in economic data analysis helps in finding
the demand for the particular goods based on certain key
words.
• The shuffle and sort process is dependent mainly on
volume of the data sets.
Performance Requirements
• The major aim for choosing the domain of big data for
economic analysis is for the velocity criteria of data
processing.
• Connecting of the commodity systems and forming the
node between them helps in quick retrieval of the data
items.
• There is a vast development of flexibility in distributed
system environment.
• Hardware Requirements
Processor : Core i3 onwards
RAM : 4GB +
Hard disk space : 40GB +
• Software Requirements
Technology : Hadoop
Tools : Apache Hive
Apache Pig
Apache Sqoop
Apache oozie
R Studio
Operating System : Linux
System Architecture
Class Diagram
Algorithm Design
Dataflow Diagram
Level 1 DFD
Activity Diagram
Use case Diagram – Admin & User
Use case Diagram-Developer
Sequence Diagram
Requirements
Unstructured Datasets
Structured Datasets
System Implementation
R Environment
Experimental Results
Test Cases
Testcase no Testcase
Discription
Required
input
Expected
output
Actual
output
Test
pass/fail
#TC 01 Verification of
the nodes
Command to
start Hadoop
nodes (Start-
all.sh)
All nodes should
start
All nodes are
present
P
#TC 02 Verification of
Hive Installation
Command Hive
version
It should return
Installed Hive
Hive Version is
returned
P
#TC 03 Verification of
Pig Installation
Command to
start Pig
(/opt/pig)
It should return
grunt shell
grunt shell is
returned
P
#TC 04 Verification of
Sqoop
Installation
Command Sqoop
version
It should return
Installed Sqoop
Sqoop Version is
returned
P
#TC 05 Verification of
Data Imported to
HDFS from
RDBMS
Entering to
Hadoop file
system from
local file system
Imported data
should be
present in HDFS
Imported data is
present in HDFS
P
#TC 06 Validating user
Query
Entering Query Valid query
should be
entered
Valid query is
entered
P
#TC 07 Testing the
processed data
Post Query Processed Data
should be correct
Processed Data
should be valid
P
#TC 08 Importing the
processed data to
R
Import Dataset Processed data
should be
imported
Processed Data
is imported
P
#TC 09 Mapping of
processed
dataset
Barplot() Processed
dataset should
be mapped
correctly
Processed data
is mapped
correctly
P
#TC 10 Mapping in Pie
chart
Pie() Processed data
should be
mapped in
percent
Processed data
is not mapped
with percent
F
#TC 11 Retrieving the
result Less than
5 seconds
Dump()
Result should
be displayed
within 5
seconds
Results is
displaying more
than 5 seconds
F
#TC 12 Plotting the
values obtained
in R
Plot()
All the values
should be
obtained
Some values are
missing
F
Conclusion
• The statistical analysis is carried out for fruits and
vegetables from the 1970-2013
• The major Requirements is based on the context of
inflation problem
• The analysis is done mainly on product based and Year
based
• This analysis serves as a vital input for machine
learning mechanism
Future Enhancements
• The analysis can be extended further on the food grains
• The enterprise application can be build by embedding a
search engine which will be helpful for end user
• The data sets can be tuned which may leads in deriving
of some other requirements of different paradigm
• The graphical representation can be changed further by
displaying of accurate value rather than range
Company Details
Company Name : Snipe IT Solutions
Address : # 123, 3rd floor,
70th Cross, 5th Block,
Rajajinagar Nagar,
Bengaluru.
External guide : Imran basha
Senior Consultant
Snipe IT Solutions
Email : mkimranbasha@gmail.com
Ph no : 9590071811
Analysis of economic data using big data

Analysis of economic data using big data

  • 1.
    Analysis of EconomicData Using Bigdata Presented By SHIVUMANJESH P [4JC13MCA51] VI SEM MCA SJCE Internal Guide C J HARSHITHA Assistant Professor Dept. Of MCA SJCE External Guide Imran basha Senior Consultant Snipe IT Solutions JSS MAHAVIDYAPEETHA SRI JAYACHAMARAJENDRA COLLEGE OF ENGINEERING MYSURU-570006 AN AUTONOMOUS INSTITUTE AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAVI. Presentation on
  • 2.
    Problem Definition 1. Inflationis rising as a serious threat for countries development. 2. Unscientific farming
  • 3.
    3. Big-picture problem,economic indicators and decision makers rely on the native economic transactions and on the data records.
  • 4.
    Objective  To examineeconomic data and recording the increasing and decreasing vegetables and food items prices year to year.  Preferring the fresh and edible food products and to overcome various problems of deficiency and malnutrition.  To maintain the continuous connectivity between Demand-Supply Chain
  • 5.
    Scope of TheProject • The Economic data analysis make an immense impact on E- commerce and also builds a potential to the business activities and also in the investments • The analysis is limited to the particular products and can be future extended based on the requirements and developments. • The big data analysis can be presented using the android application by providing simple and smart user interfaces about products they use in the daily life • It requires high end specification of the system on which it is implementing, dealing with large data set with diversified features and functionalities.
  • 6.
    User characteristics • Thesystem will provide a very precise and simple platform to the respective users. • The admin will provide the access to the developer as well as to the user and provides data sets. • The developer collects the data sets clusters the data based on the particular criteria and analyze the behavior of the data elements. • The user gets the desired result by firing a query.
  • 7.
    General constraints • Thebig data usage is efficient for large data sets and it is not suitable for data with less volume. • Since the main objective is based on data analysis user interface section is given least priority. • Sometimes it may find tedious to deal with complete unstructured data items. • The data which is obtained from the various source may not be of same parameters
  • 8.
    Functional Requirements  Storage •Hadoop Distributed File System is designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. • The economic data is a highly diversified data set which is both large and variety in nature. • A dataset is typically generated or copied from source, and then various analyses are performed on that dataset over time. • Applications that require low-latency access to data, in the tens of milliseconds range, will not work well with HDFS.
  • 9.
     Computation • MapReduceis a processing technique that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. • The MapReduce algorithm contains two important tasks, namely Map and Reduce. • This algorithm in economic data analysis helps in finding the demand for the particular goods based on certain key words. • The shuffle and sort process is dependent mainly on volume of the data sets.
  • 10.
    Performance Requirements • Themajor aim for choosing the domain of big data for economic analysis is for the velocity criteria of data processing. • Connecting of the commodity systems and forming the node between them helps in quick retrieval of the data items. • There is a vast development of flexibility in distributed system environment.
  • 11.
    • Hardware Requirements Processor: Core i3 onwards RAM : 4GB + Hard disk space : 40GB + • Software Requirements Technology : Hadoop Tools : Apache Hive Apache Pig Apache Sqoop Apache oozie R Studio Operating System : Linux
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Use case Diagram– Admin & User
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 36.
  • 39.
  • 53.
  • 54.
    Testcase no Testcase Discription Required input Expected output Actual output Test pass/fail #TC01 Verification of the nodes Command to start Hadoop nodes (Start- all.sh) All nodes should start All nodes are present P #TC 02 Verification of Hive Installation Command Hive version It should return Installed Hive Hive Version is returned P #TC 03 Verification of Pig Installation Command to start Pig (/opt/pig) It should return grunt shell grunt shell is returned P #TC 04 Verification of Sqoop Installation Command Sqoop version It should return Installed Sqoop Sqoop Version is returned P #TC 05 Verification of Data Imported to HDFS from RDBMS Entering to Hadoop file system from local file system Imported data should be present in HDFS Imported data is present in HDFS P #TC 06 Validating user Query Entering Query Valid query should be entered Valid query is entered P #TC 07 Testing the processed data Post Query Processed Data should be correct Processed Data should be valid P #TC 08 Importing the processed data to R Import Dataset Processed data should be imported Processed Data is imported P
  • 55.
    #TC 09 Mappingof processed dataset Barplot() Processed dataset should be mapped correctly Processed data is mapped correctly P #TC 10 Mapping in Pie chart Pie() Processed data should be mapped in percent Processed data is not mapped with percent F #TC 11 Retrieving the result Less than 5 seconds Dump() Result should be displayed within 5 seconds Results is displaying more than 5 seconds F #TC 12 Plotting the values obtained in R Plot() All the values should be obtained Some values are missing F
  • 56.
    Conclusion • The statisticalanalysis is carried out for fruits and vegetables from the 1970-2013 • The major Requirements is based on the context of inflation problem • The analysis is done mainly on product based and Year based • This analysis serves as a vital input for machine learning mechanism
  • 57.
    Future Enhancements • Theanalysis can be extended further on the food grains • The enterprise application can be build by embedding a search engine which will be helpful for end user • The data sets can be tuned which may leads in deriving of some other requirements of different paradigm • The graphical representation can be changed further by displaying of accurate value rather than range
  • 58.
    Company Details Company Name: Snipe IT Solutions Address : # 123, 3rd floor, 70th Cross, 5th Block, Rajajinagar Nagar, Bengaluru. External guide : Imran basha Senior Consultant Snipe IT Solutions Email : mkimranbasha@gmail.com Ph no : 9590071811