This paper is an effort to present the basic understanding of BIG DATA and
HADOOP and its usefulness to an organization from the performance perspective. Along-with the
introduction of BIG DATA, the important parameters and attributes that make this emerging concept
attractive to organizations has been highlighted. The paper also evaluates the difference in the
challenges faced by a small organization as compared to a medium or large scale operation and
therefore the differences in their approach and treatment of BIG DATA. As Hadoop is a Substantial
scale, open source programming system committed to adaptable, disseminated, information
concentrated processing. A number of application examples of implementation of BIG DATA across
industries varying in strategy, product and processes have been presented. This paper also deals
with the technology aspects of BIG DATA for its implementation in organizations. Since HADOOP has
emerged as a popular tool for BIG DATA implementation. Map reduce is a programming structure for
effectively composing requisitions which prepare boundless measures of information (multi-terabyte
information sets) in- parallel on extensive bunches of merchandise fittings in a dependable,
shortcoming tolerant way. A Map reduce skeleton comprises of two parts. They are “mapper" and
"reducer" which have been examined in this paper. The paper deals with the overall architecture of
HADOOP along with the details of its various components in Big Data.
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
This document discusses big data analytical methods, cloud computing, and how they can be combined. It explains that big data involves large amounts of structured, semi-structured, and unstructured data from various sources that requires significant computing resources to analyze. Cloud computing provides a way for big data analytics to be offered as a service and processed efficiently using cloud resources. The integration of big data and cloud computing allows organizations to gain business intelligence from large datasets in a flexible, scalable and cost-effective manner.
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
This document provides an overview of big data analytics approaches and tools. It begins with an abstract discussing the need to evaluate different methodologies and technologies based on organizational needs to identify the optimal solution. The document then reviews literature on big data analytics tools and techniques, and evaluates challenges faced by small vs large organizations. Several big data application examples across industries are presented. The document also introduces concepts of big data including the 3Vs (volume, velocity, variety), describes tools like Hadoop, Cloudera and Cassandra, and discusses scaling big data technologies based on an organization's requirements.
This document discusses challenges and solutions related to big data implementation. Some key challenges mentioned include reluctance to invest in big data strategies, integrating traditional and big data, and finding professionals with both big data and domain skills. The document recommends starting small with proofs of concept and taking an iterative approach to derive early benefits from big data before making larger investments. It also stresses the importance of having an enterprise-wide data strategy and acquiring various skills needed for big data projects.
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
The document discusses the history and evolution of big data systems. It describes how traditional RDBMS databases could not handle the increasing volume, velocity, and variety of data being generated. This led to the development of new big data systems like Hadoop. Hadoop was developed based on Google's paper on its distributed file system GFS and MapReduce framework. Hadoop is now widely adopted and its ecosystem has grown to include components like Hive, Pig, and HBase. The document also discusses how big data systems are now used across many industries beyond internet companies to gain insights from large, diverse datasets.
This document provides case studies on how several companies leverage big data, including Google, GE, Cornerstone, and Microsoft. The Google case study describes how Google processes billions of search queries daily and uses this data to continuously improve its search algorithms. The GE case study outlines how GE collects vast amounts of sensor data from power turbines, jet engines, and other industrial equipment to optimize operations and efficiency. The Cornerstone case study examines how Cornerstone uses employee data to help clients predict retention and performance. Finally, the Microsoft case study discusses how Microsoft has positioned itself as a major player in big data and offers data hosting and analytics services.
This document provides an overview of big data and commonly used methodologies. It defines big data as large volumes of complex data from various sources that is difficult to process using traditional data management tools. The key aspects of big data are volume, variety, and velocity. Hadoop is discussed as a popular framework for processing big data using the MapReduce programming model. HDFS is summarized as a distributed file system used with Hadoop to store and manage large datasets across clusters of computers. Challenges of big data such as storage capacity, processing large and complex datasets, and real-time analytics are also mentioned.
This document discusses combining Hadoop with big data analytics. It begins by exploring how Hadoop has become a popular framework for handling big data challenges. It then discusses some of the key skills needed for successful big data analytics programs, including technical skills with tools like Hadoop as well as business knowledge. Specifically, it recommends including business analysts, BI developers, predictive model builders, data architects, data integration developers, and technology architects on any big data analytics team.
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
Abstract: Big data refers to the organizational data asset that exceeds the volume, velocity, and variety of data typically stored using traditional structured database technologies. This type of data has become the important resource from which organizations can get valuable insightand make business decision by applying predictive analysis. This paper provides a comprehensive view of current status of big data development,starting from the definition and the description of Hadoop and MapReduce – the framework that standardizes the use of cluster of commodity machines to analyze big data. For the organizations that are ready to embrace big data technology, significant adjustments on infrastructure andthe roles played byIT professionals and BI practitioners must be anticipated which is discussed in the challenges of big data section. The landscape of big data development change rapidly which is directly related to the trend of big data. Clearly, a major part of the trend is the result ofthe attempt to deal with the challenges discussed earlier. Lastly the paper includes the most recent job prospective related to big data. The description of several job titles that comprise the workforce in the area of big data are also included.
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
This document discusses big data analytical methods, cloud computing, and how they can be combined. It explains that big data involves large amounts of structured, semi-structured, and unstructured data from various sources that requires significant computing resources to analyze. Cloud computing provides a way for big data analytics to be offered as a service and processed efficiently using cloud resources. The integration of big data and cloud computing allows organizations to gain business intelligence from large datasets in a flexible, scalable and cost-effective manner.
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
This document provides an overview of big data analytics approaches and tools. It begins with an abstract discussing the need to evaluate different methodologies and technologies based on organizational needs to identify the optimal solution. The document then reviews literature on big data analytics tools and techniques, and evaluates challenges faced by small vs large organizations. Several big data application examples across industries are presented. The document also introduces concepts of big data including the 3Vs (volume, velocity, variety), describes tools like Hadoop, Cloudera and Cassandra, and discusses scaling big data technologies based on an organization's requirements.
This document discusses challenges and solutions related to big data implementation. Some key challenges mentioned include reluctance to invest in big data strategies, integrating traditional and big data, and finding professionals with both big data and domain skills. The document recommends starting small with proofs of concept and taking an iterative approach to derive early benefits from big data before making larger investments. It also stresses the importance of having an enterprise-wide data strategy and acquiring various skills needed for big data projects.
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
The document discusses the history and evolution of big data systems. It describes how traditional RDBMS databases could not handle the increasing volume, velocity, and variety of data being generated. This led to the development of new big data systems like Hadoop. Hadoop was developed based on Google's paper on its distributed file system GFS and MapReduce framework. Hadoop is now widely adopted and its ecosystem has grown to include components like Hive, Pig, and HBase. The document also discusses how big data systems are now used across many industries beyond internet companies to gain insights from large, diverse datasets.
This document provides case studies on how several companies leverage big data, including Google, GE, Cornerstone, and Microsoft. The Google case study describes how Google processes billions of search queries daily and uses this data to continuously improve its search algorithms. The GE case study outlines how GE collects vast amounts of sensor data from power turbines, jet engines, and other industrial equipment to optimize operations and efficiency. The Cornerstone case study examines how Cornerstone uses employee data to help clients predict retention and performance. Finally, the Microsoft case study discusses how Microsoft has positioned itself as a major player in big data and offers data hosting and analytics services.
This document provides an overview of big data and commonly used methodologies. It defines big data as large volumes of complex data from various sources that is difficult to process using traditional data management tools. The key aspects of big data are volume, variety, and velocity. Hadoop is discussed as a popular framework for processing big data using the MapReduce programming model. HDFS is summarized as a distributed file system used with Hadoop to store and manage large datasets across clusters of computers. Challenges of big data such as storage capacity, processing large and complex datasets, and real-time analytics are also mentioned.
This document discusses combining Hadoop with big data analytics. It begins by exploring how Hadoop has become a popular framework for handling big data challenges. It then discusses some of the key skills needed for successful big data analytics programs, including technical skills with tools like Hadoop as well as business knowledge. Specifically, it recommends including business analysts, BI developers, predictive model builders, data architects, data integration developers, and technology architects on any big data analytics team.
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
Abstract: Big data refers to the organizational data asset that exceeds the volume, velocity, and variety of data typically stored using traditional structured database technologies. This type of data has become the important resource from which organizations can get valuable insightand make business decision by applying predictive analysis. This paper provides a comprehensive view of current status of big data development,starting from the definition and the description of Hadoop and MapReduce – the framework that standardizes the use of cluster of commodity machines to analyze big data. For the organizations that are ready to embrace big data technology, significant adjustments on infrastructure andthe roles played byIT professionals and BI practitioners must be anticipated which is discussed in the challenges of big data section. The landscape of big data development change rapidly which is directly related to the trend of big data. Clearly, a major part of the trend is the result ofthe attempt to deal with the challenges discussed earlier. Lastly the paper includes the most recent job prospective related to big data. The description of several job titles that comprise the workforce in the area of big data are also included.
This document discusses big data and Hadoop. It begins with an introduction to big data, noting the volume, variety and velocity of data. It then provides an overview of Hadoop, including its core components HDFS for storage and MapReduce for processing. The document also outlines some of the key challenges of big data including heterogeneity, scale, timeliness, privacy and the need for human collaboration in analysis. It concludes by discussing how Hadoop provides a solution for big data processing through its distributed architecture and use of HDFS and MapReduce.
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
( ** Hadoop Training: https://www.edureka.co/hadoop ** )
This Edureka tutorial on "Big Data Applications" will explain various how Big Data analytics can be used in various domains. Following are the topics included in this tutorial:
1. Why do we need Big Data Analytics?
2. Big Data Applications in Health Care.
3. Big Data in Real World Clinical Analytics.
4. Big Data Analytics in Education Sector.
5. IBM Case Study in Education Section.
6. Big data applications and use cases in E-Commerce.
7. How Government uses Big Data analytics?
8. How Big data is helpful in E-Government Portal?
9. Big Data in IOT.
10. Smart city concept.
11. Big Data analytics in Media and Entertainment
12. Netflix example in Big data
13. Future Scope of Big data.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
This document discusses big data analytics techniques like Hadoop MapReduce and NoSQL databases. It begins with an introduction to big data and how the exponential growth of data presents challenges that conventional databases can't handle. It then describes Hadoop, an open-source software framework that allows distributed processing of large datasets across clusters of computers using a simple programming model. Key aspects of Hadoop covered include MapReduce, HDFS, and various other related projects like Pig, Hive, HBase etc. The document concludes with details about how Hadoop MapReduce works, including its master-slave architecture and how it provides fault tolerance.
This report examines the rise of big data and analytics used to analyze large volumes of data. It is based on a survey of 302 BI professionals and interviews. Most organizations have implemented analytical platforms to help analyze growing amounts of structured data. New technologies also analyze semi-structured data like web logs and machine data. While reports and dashboards serve casual users, more advanced analytics are needed for power users to fully leverage big data.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Analytics: The Real-world Use of Big DataDavid Pittman
UPDATE: Register now to participate in the 2013 survey: http://ibm.com/2013bigdatasurvey IBM’s Institute for Business Value (IBV) and the University of Oxford released their information-rich and insightful report “Analytics: The real-world use of big data.” Based on a survey of over 1000 professionals from 100 countries across 25+ industries, the report provides insights into organizations’ top business objectives, where they are in their big data journey, and how they are advancing their big data efforts. It also provides a pragmatic set of recommendations to organizations as they proceed down the path of big data. For additional information, including links to a podcast with one of the lead researchers and a link to download the full report, visit http://ibm.co/RB14V0
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
IBM is helping companies leverage big data through its IBM big data platform and supercomputing capabilities. The document discusses how Vestas Wind Systems uses IBM's solution to analyze weather data and provide location site data in minutes instead of weeks from 2.8 petabytes to 24 petabytes of data. It also mentions how other customers like x+1, KTH Royal Institute of Technology, and University of Ontario-Institute of Technology are achieving growth, reducing traffic times, and improving patient outcomes respectively through big data analytics. The VP of IBM business development hopes readers will consider IBM for their big data challenges.
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONijmnct
1) The document discusses how big data integration can be used to bridge data silos that exist in many enterprises due to different business applications generating structured, semi-structured, and unstructured data. 2) It explains that traditional data integration techniques are not well-suited for big data due to issues with scale and handling semi-structured and unstructured data. 3) Big data integration techniques like Hadoop, Spark, Kafka and data lakes can be better suited for integrating large heterogeneous data sources in real-time or in batches at scale.
The document discusses big data challenges faced by organizations. It identifies several key challenges: heterogeneity and incompleteness of data, issues of scale as data volumes increase, timeliness in processing large datasets, privacy concerns, and the need for human collaboration in analyzing data. The document describes surveying various organizations in Pakistan, including educational institutions, telecommunications companies, hospitals, and electrical utilities, to understand the big data problems they face. Common challenges included data errors, missing or incomplete data, lack of data management tools, and issues integrating different data sources. The survey found that while some organizations used big data tools, many educational institutions in particular did not, limiting their ability to effectively manage and analyze their large and growing datasets.
The document is a project report submitted by Suraj Sawant to his college on the topic of "Map Reduce in Big Data". It discusses the objectives, introduction and importance of big data and MapReduce. MapReduce is a programming model used for processing large datasets in a distributed manner. The document provides details about the various stages of MapReduce including mapping, shuffling and reducing data. It also includes diagrams to explain the execution process and parallel processing in MapReduce.
This document discusses big data and tools for analyzing big data. It defines big data as very large data sets that are difficult to analyze using traditional data management systems due to the volume, variety and velocity of the data. It describes characteristics of big data including the 5 V's: volume, variety, velocity, veracity and value. It also discusses some common tools for analyzing big data, including MapReduce, Apache Hadoop, Apache Mahout and Apache Spark. Finally, it briefly mentions the potential of quantum computing for big data analysis by being able to process exponentially large amounts of data simultaneously.
Trends in Big Data & Business Challenges Experian_US
Join our #DataTalk on Thursdays at 5 p.m. ET. This week, we tweeted with Sushil Pramanick – who is the founder and president of the The Big Data Institute (TBDI).
You can learn about upcoming chats and see the archive of past big data tweetchats here
http://www.experian.com/blogs/news/about/datadriven
Architecting a-big-data-platform-for-analytics 24606569Kun Le
This document discusses the growth of big data and the need for businesses to analyze new and complex data sources. It describes how data has become more varied in type, larger in volume, and generated faster. It also outlines different types of big data analytics workloads and technology options for building an end-to-end big data analytics platform. Finally, it provides an example of IBM's solution for analyzing both data in motion and at rest across the entire big data analytics lifecycle.
Best practices for building and deploying predictive models over big data pre...Kun Le
The tutorial is divided into 12 modules that cover best practices for building and deploying predictive models over big data. It introduces key concepts like predictive analytics, building predictive models, and deploying models. The life cycle of a predictive model is also described, from exploratory data analysis to deployment and operations.
The document discusses strategies for organizations to better manage big data when resources are limited. It recommends identifying unused data in the data warehouse in order to reduce costs by moving that data to cheaper platforms like Hadoop. Organizations can save millions by offloading data that is not frequently queried but must be retained for regulatory reasons. The document also suggests purging data that is not needed at all to further reduce storage and management costs. Proper classification and placement of data onto platforms suited to its usage level and type, such as Hadoop for less critical datasets, can help organizations get more value from their data with fewer resources.
This document discusses how Oracle Data Integrator 12c (ODI12c) can bridge the gap between big data and enterprise data. It allows users to integrate both types of data through a single unified tool. Key features include application adapters for Hadoop that enable native Hadoop integration, loading and transforming of big data, and integrated platforms and real-time analytics to simplify, optimize, and extend the value of big data.
The document discusses position-based routing protocols for vehicular ad hoc networks (VANETs). It begins with an introduction to VANETs and their characteristics. It then describes different types of routing protocols used in mobile ad hoc networks (MANETs) and VANETs, including proactive, reactive, topology-based, position-based, broadcast-based, and geocast routing protocols. The document focuses on various position-based routing protocols for VANETs such as OLSR, lifetime-based, GSR, A-STAR, GPCR, GyTAR, DBR, and DBR-LS. It evaluates these protocols and explains their advantages and limitations, particularly in dealing with
This document discusses big data and Hadoop. It begins with an introduction to big data, noting the volume, variety and velocity of data. It then provides an overview of Hadoop, including its core components HDFS for storage and MapReduce for processing. The document also outlines some of the key challenges of big data including heterogeneity, scale, timeliness, privacy and the need for human collaboration in analysis. It concludes by discussing how Hadoop provides a solution for big data processing through its distributed architecture and use of HDFS and MapReduce.
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
( ** Hadoop Training: https://www.edureka.co/hadoop ** )
This Edureka tutorial on "Big Data Applications" will explain various how Big Data analytics can be used in various domains. Following are the topics included in this tutorial:
1. Why do we need Big Data Analytics?
2. Big Data Applications in Health Care.
3. Big Data in Real World Clinical Analytics.
4. Big Data Analytics in Education Sector.
5. IBM Case Study in Education Section.
6. Big data applications and use cases in E-Commerce.
7. How Government uses Big Data analytics?
8. How Big data is helpful in E-Government Portal?
9. Big Data in IOT.
10. Smart city concept.
11. Big Data analytics in Media and Entertainment
12. Netflix example in Big data
13. Future Scope of Big data.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
This document discusses big data analytics techniques like Hadoop MapReduce and NoSQL databases. It begins with an introduction to big data and how the exponential growth of data presents challenges that conventional databases can't handle. It then describes Hadoop, an open-source software framework that allows distributed processing of large datasets across clusters of computers using a simple programming model. Key aspects of Hadoop covered include MapReduce, HDFS, and various other related projects like Pig, Hive, HBase etc. The document concludes with details about how Hadoop MapReduce works, including its master-slave architecture and how it provides fault tolerance.
This report examines the rise of big data and analytics used to analyze large volumes of data. It is based on a survey of 302 BI professionals and interviews. Most organizations have implemented analytical platforms to help analyze growing amounts of structured data. New technologies also analyze semi-structured data like web logs and machine data. While reports and dashboards serve casual users, more advanced analytics are needed for power users to fully leverage big data.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Analytics: The Real-world Use of Big DataDavid Pittman
UPDATE: Register now to participate in the 2013 survey: http://ibm.com/2013bigdatasurvey IBM’s Institute for Business Value (IBV) and the University of Oxford released their information-rich and insightful report “Analytics: The real-world use of big data.” Based on a survey of over 1000 professionals from 100 countries across 25+ industries, the report provides insights into organizations’ top business objectives, where they are in their big data journey, and how they are advancing their big data efforts. It also provides a pragmatic set of recommendations to organizations as they proceed down the path of big data. For additional information, including links to a podcast with one of the lead researchers and a link to download the full report, visit http://ibm.co/RB14V0
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
IBM is helping companies leverage big data through its IBM big data platform and supercomputing capabilities. The document discusses how Vestas Wind Systems uses IBM's solution to analyze weather data and provide location site data in minutes instead of weeks from 2.8 petabytes to 24 petabytes of data. It also mentions how other customers like x+1, KTH Royal Institute of Technology, and University of Ontario-Institute of Technology are achieving growth, reducing traffic times, and improving patient outcomes respectively through big data analytics. The VP of IBM business development hopes readers will consider IBM for their big data challenges.
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONijmnct
1) The document discusses how big data integration can be used to bridge data silos that exist in many enterprises due to different business applications generating structured, semi-structured, and unstructured data. 2) It explains that traditional data integration techniques are not well-suited for big data due to issues with scale and handling semi-structured and unstructured data. 3) Big data integration techniques like Hadoop, Spark, Kafka and data lakes can be better suited for integrating large heterogeneous data sources in real-time or in batches at scale.
The document discusses big data challenges faced by organizations. It identifies several key challenges: heterogeneity and incompleteness of data, issues of scale as data volumes increase, timeliness in processing large datasets, privacy concerns, and the need for human collaboration in analyzing data. The document describes surveying various organizations in Pakistan, including educational institutions, telecommunications companies, hospitals, and electrical utilities, to understand the big data problems they face. Common challenges included data errors, missing or incomplete data, lack of data management tools, and issues integrating different data sources. The survey found that while some organizations used big data tools, many educational institutions in particular did not, limiting their ability to effectively manage and analyze their large and growing datasets.
The document is a project report submitted by Suraj Sawant to his college on the topic of "Map Reduce in Big Data". It discusses the objectives, introduction and importance of big data and MapReduce. MapReduce is a programming model used for processing large datasets in a distributed manner. The document provides details about the various stages of MapReduce including mapping, shuffling and reducing data. It also includes diagrams to explain the execution process and parallel processing in MapReduce.
This document discusses big data and tools for analyzing big data. It defines big data as very large data sets that are difficult to analyze using traditional data management systems due to the volume, variety and velocity of the data. It describes characteristics of big data including the 5 V's: volume, variety, velocity, veracity and value. It also discusses some common tools for analyzing big data, including MapReduce, Apache Hadoop, Apache Mahout and Apache Spark. Finally, it briefly mentions the potential of quantum computing for big data analysis by being able to process exponentially large amounts of data simultaneously.
Trends in Big Data & Business Challenges Experian_US
Join our #DataTalk on Thursdays at 5 p.m. ET. This week, we tweeted with Sushil Pramanick – who is the founder and president of the The Big Data Institute (TBDI).
You can learn about upcoming chats and see the archive of past big data tweetchats here
http://www.experian.com/blogs/news/about/datadriven
Architecting a-big-data-platform-for-analytics 24606569Kun Le
This document discusses the growth of big data and the need for businesses to analyze new and complex data sources. It describes how data has become more varied in type, larger in volume, and generated faster. It also outlines different types of big data analytics workloads and technology options for building an end-to-end big data analytics platform. Finally, it provides an example of IBM's solution for analyzing both data in motion and at rest across the entire big data analytics lifecycle.
Best practices for building and deploying predictive models over big data pre...Kun Le
The tutorial is divided into 12 modules that cover best practices for building and deploying predictive models over big data. It introduces key concepts like predictive analytics, building predictive models, and deploying models. The life cycle of a predictive model is also described, from exploratory data analysis to deployment and operations.
The document discusses strategies for organizations to better manage big data when resources are limited. It recommends identifying unused data in the data warehouse in order to reduce costs by moving that data to cheaper platforms like Hadoop. Organizations can save millions by offloading data that is not frequently queried but must be retained for regulatory reasons. The document also suggests purging data that is not needed at all to further reduce storage and management costs. Proper classification and placement of data onto platforms suited to its usage level and type, such as Hadoop for less critical datasets, can help organizations get more value from their data with fewer resources.
This document discusses how Oracle Data Integrator 12c (ODI12c) can bridge the gap between big data and enterprise data. It allows users to integrate both types of data through a single unified tool. Key features include application adapters for Hadoop that enable native Hadoop integration, loading and transforming of big data, and integrated platforms and real-time analytics to simplify, optimize, and extend the value of big data.
The document discusses position-based routing protocols for vehicular ad hoc networks (VANETs). It begins with an introduction to VANETs and their characteristics. It then describes different types of routing protocols used in mobile ad hoc networks (MANETs) and VANETs, including proactive, reactive, topology-based, position-based, broadcast-based, and geocast routing protocols. The document focuses on various position-based routing protocols for VANETs such as OLSR, lifetime-based, GSR, A-STAR, GPCR, GyTAR, DBR, and DBR-LS. It evaluates these protocols and explains their advantages and limitations, particularly in dealing with
The quality of the machined piece and tool life are greatly influenced by determination of
maximum temperature of the cutting tool. Numerous researchers have approached to solve this problem
with experimental, analytical and numerical analysis. There is hardly a consensus on the basics principles
of the thermal problem in metal cutting, even though considerable research effort has been made on it. It is
exceedingly difficult to predict in a precise manner the performance of tool for the machining process. This
paper reviews work on the requirements for optimization of Tool wear so that its life could easily be
predicted.
The usual star, left-star, right-star, plus order, minus order and Lowner ordering have been
generalized to bimatrices. Also it is shown that all these orderings are partial orderings in bimatrices.
The relationship between star partial order and minus partial order of bimatrices and their squares are
examined.
Mislaid character analysis using 2-dimensional discrete wavelet transform for...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
This document presents models for problems that arise in Yang-Mills theory. It factors Yang-Mills theory and provides a model for the left-hand side and right-hand side of the Yang-Mills equations. The models consider parameters like symmetry breaking, mass acquisition, strong and electroweak interactions, gauge invariance, gauge fields, and non-commutative symmetry groups. Differential equations are provided for six modules analyzing these concepts. The goal is to lay a foundation for further factorizations and solutions to problems at the Planck scale.
Determination of Some Mechanical And Hydraulic Properties Of Biu Clayey Soils...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
The document describes a study that used a penetrometer needle apparatus to measure soil parameters. Laboratory tests were conducted on coarse and fine sand samples compacted to different densities. The penetrometer was used to measure penetration distance under different applied normal stresses. Correlations between stress, density, and penetration distance were obtained. These correlations could be used to estimate soil parameters like friction angle and Young's modulus. The experimental setup, sample preparation methods, and some test results are presented.
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...IJMER
A numerical solution for the unsteady, natural convective flow of heat and mass transfer along an inclined plate is presented. The dimensionless unsteady, coupled, and non-linear partial differential conservation equations for the boundary layer regime are solved by an efficient, accurate and unconditionally stable finite difference scheme of the Crank-Nicolson type. The velocity, temperature, and concentration fields have been studied for the effect of Magnetic parameter, buoyancy ratio parameter, Prandtl number, radiation parameter and Schmidt number. The local skin-friction, Nusselt number and Sherwood number are also presented and analyzed graphically.
Effect on Efficiency of Two-phase Flow Distribution in a Parallel Flow Heat E...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
This document summarizes a research paper on developing a smart web cam motion detection surveillance system using a low-cost webcam. The system aims to detect intruders and stop intrusions in progress. It does this by comparing frames from the webcam video stream to detect motion. When motion is detected, it sounds an alarm, emails photos of the intruder to the administrator, and sends an SMS alert. The system is designed to be low-cost by using a webcam instead of more expensive security cameras. It outlines the system architecture, including capturing video, comparing frames to detect motion, storing motion frames, and alerting the administrator. It also discusses related work on motion detection and implementation details like background/foreground separation.
This document summarizes research investigating the strength characteristics of concrete with agricultural by-products replacing portions of ordinary Portland cement. 105 concrete cubes were made with 8% cement replaced by rice husk ash, sawdust ash, oil palm bunch ash, cassava waste ash, coconut husk ash, corn cob ash, plantain leaf ash, or pawpaw leaf ash at replacement levels of 5%, 10%, 15%, 20%, and 25%. The cubes were tested for compressive strength at various ages. Strength generally increased with curing time and decreased with higher replacement levels, though 90-day strengths at 5-10% replacement exceeded the control for some ashes. The ashes showed potential as supplementary cementitious materials but would require
This is a webquest about Digital Citizenship for sixth to eighth graders. It covers topics such as plagiarism and "netiquette". From this webquest, the students will create a Digital Citizenship portfolio and analyze the information to conclude what Digital Citizenship means to them.
This document discusses plans to start an online library called Make My Library that connects students to buy and sell textbooks. It will use an auction platform to allow students to sell textbooks and class materials at higher prices than local bookstores offer. The founders will focus on recruiting an experienced team, grassroots marketing, and innovative products and services. Initial funding will be used for marketing, website development, and operating expenses until revenues can sustain the business. The founders aim to expand offerings beyond textbooks over time.
This document summarizes a study on the effect of fuel and air injection patterns on combustion dynamics in confined and free diffusion flames. A new burner design called the circumferential alternative air and fuel burner (CAFB) was developed that allows for swirling and circumferentially alternating fuel and air injection. Experiments were conducted to examine the effects of varying the injection angle of the fuel and air jets from 0 to 60 degrees. Flame characteristics such as height, temperature distribution, and emissions were measured. The results showed that increasing the injection angle improved mixing and generated a more intense combustion zone, shortening the flame length. A new "injection swirl number" parameter was introduced to characterize the burner based on the injection angle.
Online Bus Arrival Time Prediction Using Hybrid Neural Network and Kalman fil...IJMER
This document presents a hybrid method for predicting bus arrival times using neural networks and Kalman filters. The proposed method combines a neural network trained on historical bus location and travel time data to make initial predictions, and then uses a Kalman filter to continuously update the predictions based on real-time GPS measurements from buses. The neural network model uses seven input nodes and a double hidden layer structure. The Kalman filter equations are used to fuse the neural network predictions with current GPS observations to improve prediction accuracy over time. A case study on a real bus route in Egypt showed the hybrid method achieved satisfactory prediction accuracy.
Evaluation of Tensile and Flexural Properties of Polymer Matrix CompositesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
This document summarizes a paper on improving the coefficient of performance (COP) of an ice making plant using a vapor compression refrigeration system without adding any additional systems. The paper discusses how cleaning condenser tubes to improve heat transfer increased the COP from 5.37 to 5.59, reducing ice production time from 48 to 46 hours. This saved costs and increased profits without requiring capital investment in new equipment. Preventative maintenance like cleaning condenser tubes can improve plant efficiency and profits.
This document summarizes the design and simulation of a photovoltaic (PV) inverter system using Proteus software. Key aspects include:
1) The system uses a microcontroller to generate pulse width modulated (PWM) signals that control a half-bridge inverter to convert DC voltage from solar panels to AC voltage. An LC filter is used to produce a sinusoidal output.
2) Software was developed to generate sinusoidal PWM pulses using a direct modulation strategy. Pulse widths are proportional to the sine wave amplitude at sampled points.
3) The system was simulated in Proteus software. Results showed an AC output voltage of 228V and current of 1.7A, meeting specifications
Leveraging research findings from EMA's 2012 "Big Data Comes of Age" Research Report, this new Infographic outlines the five business requirements driving Big Data solutions and the technologies that support those requirements.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable, scalable and distributed processing of large datasets. Hadoop consists of Hadoop Distributed File System (HDFS) for storage and Hadoop MapReduce for processing vast amounts of data in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. HDFS stores data reliably across machines in a Hadoop cluster and MapReduce processes data in parallel by breaking the job into smaller fragments of work executed across cluster nodes.
Big data refers to large volumes of structured and unstructured data that are difficult to process using traditional database and software techniques. It encompasses the 3Vs - volume, velocity, and variety. Hadoop is an open-source framework that stores and processes big data across clusters of commodity servers using the MapReduce algorithm. It allows applications to work with huge amounts of data in parallel. Organizations use big data and analytics to gain insights for reducing costs, optimizing offerings, and making smarter decisions across industries like banking, government, and education.
This document provides an overview of big data concepts including:
- Mohamed Magdy's background and credentials in big data engineering and data science.
- Definitions of big data, the three V's of big data (volume, velocity, variety), and why big data analytics is important.
- Descriptions of Hadoop, HDFS, MapReduce, and YARN - the core components of Hadoop architecture for distributed storage and processing of big data.
- Explanations of HDFS architecture, data blocks, high availability in HDFS 2/3, and erasure coding in HDFS 3.
Big data - what, why, where, when and howbobosenthil
The document discusses big data, including what it is, its characteristics, and architectural frameworks for managing it. Big data is defined as data that exceeds the processing capacity of conventional database systems due to its large size, speed of creation, and unstructured nature. The architecture for managing big data is demonstrated through Hadoop technology, which uses a MapReduce framework and open source ecosystem to process data across multiple nodes in parallel.
The document discusses big data testing using the Hadoop platform. It describes how Hadoop, along with technologies like HDFS, MapReduce, YARN, Pig, and Spark, provides tools for efficiently storing, processing, and analyzing large volumes of structured and unstructured data distributed across clusters of machines. These technologies allow organizations to leverage big data to gain valuable insights by enabling parallel computation of massive datasets.
This document discusses big data business opportunities and solutions. It notes that big data solutions are tailored to specific data types and workloads. Common business domains for big data include web analytics, clickstream analysis using the ELK stack, and big data in the cloud to provide auto-scaling, low costs, and use of cloud services. Effective big data solutions require data governance, cluster modeling, and analytics and visualization.
The Big Data Importance – Tools and their UsageIRJET Journal
This document discusses big data, tools for analyzing big data, and opportunities that big data analytics provides. It begins by defining big data and its key characteristics of volume, variety and velocity. It then discusses tools for storing, managing and processing big data like Hadoop, MapReduce and HDFS. Finally, it outlines how big data analytics can be applied across different domains to enable new insights and informed decision making through analyzing large datasets.
This document provides an overview and introduction to big data implementation strategies using Hadoop and beyond. It discusses how big data has evolved from technologies pioneered by companies like Google to analyze vast amounts of diverse data cheaper and more effectively than traditional methods. It also outlines some of the key challenges organizations face as data volumes, varieties, and velocities outgrow existing systems, and how new big data technologies like Hadoop provide more cost-effective solutions to process and analyze data at scale. The document notes that big data represents a shift in computing paradigms rather than just data size alone.
The document discusses Luminar, an analytics company that uses big data and Hadoop to provide insights about Latino consumers in the US. Luminar collects data from over 2,000 sources and uses that data along with "cultural filters" to identify Latinos and understand their purchasing behaviors. This provides more accurate information than traditional surveys. Luminar implemented a Hadoop system to more quickly analyze this large amount of data and provide valuable insights to marketers and businesses.
Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
A whitepaper about how the evolving data engineering profession helps data-driven companies work smarter and lower cloud costs with Qubole.
https://www.qubole.com/resources/white-papers/the-evolving-role-of-the-data-engineer
Enterprises are facing exponentially increasing amounts of data that is breaking down traditional storage architectures. NetApp addresses this "big data challenge" through their "Big Data ABCs" approach - focusing on analytics, bandwidth, and content. This enables customers to gain insights from massive datasets, move data quickly for high-speed applications, and securely store unlimited amounts of content for long periods without increasing complexity. NetApp's solutions provide a foundation for enterprises to innovate with data and drive business value.
Big Data Processing with Hadoop : A ReviewIRJET Journal
1. This document provides an overview of big data processing with Hadoop. It defines big data and describes the challenges of volume, velocity, variety and variability.
2. Traditional data processing approaches are inadequate for big data due to its scale. Hadoop provides a distributed file system called HDFS and a MapReduce framework to address this.
3. HDFS uses a master-slave architecture with a NameNode and DataNodes to store and retrieve file blocks. MapReduce allows distributed processing of large datasets across clusters through mapping and reducing functions.
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2]connectomics, complex physics simulations,[3] and biological and environmental research.[4]
Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks.[5]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-6" [6]
HYPERLINK "http://en.wikipedia.org/wiki/Big_data" \l "cite_note-7" [7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data were created;[9] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[10]
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebook[11] that can handle the available data set.
Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".[12] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[13]
Contents
· 1 Definition
· 2 Characteristics
· 3 Architecture
· 4 Technologies
· 5 Applications
· 5.1 Government
· 5.1.1 United States of America
· 5.1.2 India
· 5.1.3 United Kingdom
· 5.2 International development
· 5.3 Manufacturing
· 5.3.1 Cyber-Physical Models
· 5.4 Media
· 5.4.1 Internet of Things (IoT)
· 5.4.2 Technology
· 5.5 Private sector
· 5.5.1 Retail
· 5.5.2 Retail Banking
· 5.5.3 Real Estate
· 5.6 Science
· 5.6.1 Science and Resear ...
This document provides an introduction to big data, including definitions of big data and its key characteristics of volume, variety, velocity, variability, and veracity. It discusses big data analysis and how it differs from traditional analytics by examining large, diverse datasets. Hadoop is presented as a popular open-source framework for managing and analyzing big data, and its use by companies like Facebook, LinkedIn, Walmart, and Twitter is described. The document also briefly outlines Hadoop's history and architecture, common Hadoop variants, skills needed to work with Hadoop, and examples of big data case studies.
This document discusses big data and its applications. It begins with an abstract that introduces big data and how extracting valuable information from large amounts of structured and unstructured data can help governments and organizations develop policies. It then discusses key aspects of big data including volume, velocity, and variety. Current big data technologies are outlined such as Hadoop, HBase, and Hive. Some big data problems and applications are also mentioned like using big data in commerce, business, and scientific research to improve forecasting, policies, productivity, and research.
The white paper discusses how enterprises are facing exponentially growing amounts of data that is breaking down traditional storage architectures. It outlines NetApp's approach to addressing big data challenges through what it calls the "Big Data ABCs" - analytics, bandwidth, and content. This allows customers to gain insights from massive data sets, move data quickly for high-performance applications, and store large amounts of content for long periods without increasing complexity. NetApp provides solutions to help enterprises take advantage of big data and turn it into business value.
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
This document discusses big data and Hadoop. It begins with defining big data and explaining its characteristics of volume, variety, velocity, and veracity. It then provides an overview of Hadoop, describing its core components of HDFS for storage and MapReduce for processing. Key technologies in Hadoop's ecosystem are also summarized like Hive, Pig, and HBase. The document concludes by outlining some challenges of big data like issues of heterogeneity and incompleteness of data.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
Similar to Influence of Hadoop in Big Data Analysis and Its Aspects (20)
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
This document summarizes research on the fabrication and characterization of bio-composite materials using sunnhemp fibre. The document discusses how sunnhemp fibre was used to reinforce an epoxy matrix through hand lay-up methods. Various mechanical properties of the bio-composites were tested, including tensile, flexural, and impact properties. The results of the mechanical tests on the bio-composite specimens are presented. Potential applications of the sunnhemp fibre bio-composites are also suggested, such as in fall ceilings, partitions, packaging, automotive interiors, and toys.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
This document discusses integrating the Spring, Struts, and Hibernate frameworks to develop enterprise applications. It provides an overview of each framework and their features. The Spring Framework is a lightweight, modular framework that allows for inversion of control and aspect-oriented programming. It can be used to develop any or all tiers of an application. The document proposes an architecture for an e-commerce website that integrates these three frameworks, with Spring handling the business layer, Struts the presentation layer, and Hibernate the data access layer. This modular approach allows for clear separation of concerns and reduces complexity in application development.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
This document introduces and studies the concept of δˆ s-locally closed sets in ideal topological spaces. Some key points:
- A subset A is δˆ s-locally closed if A can be written as the intersection of a δˆ s-open set and a δˆ s-closed set.
- Various properties of δˆ s-locally closed sets are introduced and characterized, including relationships to other concepts like generalized locally closed sets.
- It is shown that a subset A is δˆ s-locally closed if and only if A can be written as the intersection of a δˆ s-open set and the δˆ s-closure of A.
- Theore
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Natural Language Ambiguity and its Effect on Machine LearningIJMER
This document discusses natural language ambiguity and its effect on machine learning. It begins by introducing different types of ambiguity that exist in natural languages, including lexical, syntactic, semantic, discourse, and pragmatic ambiguities. It then examines how these ambiguities present challenges for computational linguistics and machine translation systems. Specifically, it notes that ambiguity is a major problem for computers in processing human language as they lack the world knowledge and context that humans use to resolve ambiguities. The document concludes by outlining the typical process of machine translation and how ambiguities can interfere with tasks like analysis, transfer, and generation of text in the target language.
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
This document describes the implementation of an I2C slave interface using Verilog HDL. It introduces the I2C protocol which uses only two bidirectional lines (SDA and SCL) for communication. The document discusses the I2C protocol specifications including start/stop conditions, addressing, read/write operations, and acknowledgements. It then provides details on designing an I2C slave module in Verilog that responds to commands from an I2C master and allows synchronization through clock stretching. The module is simulated in ModelSim and synthesized in Xilinx. Simulation waveforms demonstrate successful read and write operations to the slave device.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Influence of Hadoop in Big Data Analysis and Its Aspects
1. International
OPEN ACCESS Journal
Of Modern Engineering Research (IJMER)
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 34|
Influence of Hadoop in Big Data Analysis and Its Aspects
Sumaiya Nazneen1
, Sara Siraj2
, Tabassum Sultana3
, Nabeelah Azam4
, Tayyiba
Ambareen5
, Ethemaad Uddin Ahmed6
, Mohammed Salman Irshad7
1,2,3,4,5,6,7
Student, Department of Information Technology, Muffakham Jah college of Engineering and
Technology, Mount Pleasant, 8-2-249, Road Number 3, Banjara Hills, Hyderabad, Telangana, INDIA.
I. Introduction
Companies across the world have been using data since a long time to help them take better decisions
in order to enhance their performances. It is the first decade of the 21st century that actually showcased a
rapid shift in the availability of data and it’s applicability for improving the overall effectiveness of the
organization. This change that was to revolutionize the use of data.
Hadoop was designed especially for the analysis of large data sets to build scalable, distributed
applications. To manage sizably voluminous data, Hadoop implements the paradigm called MapReduce
defined by Google according to which the applications are divided into minute chunks of software, each of
which can be run on a distinct node of all those who make up the system. Companies like Amazon, Cloudera,
IBM, Intel, Twitter, Facebook and others are formulate their immensely enormous data message and providing
insight into where the market is headed utilizing Apache Hadoop technology.
What is Big Data: Big data is the availability of a large amount of data which becomes difficult to store,
process and mine using a traditional database primarily because of the data available is large, complex,
unstructured and rapidly changing. This is probably one of the important reasons why the concept of big data
was first embraced by online firms like Google, eBay, Facebook, LinkedIn etc.
Big Data in small v/s big companies: There is a specific reason as to why big data was first appreciated by the
online firms and start-ups as mentioned above. These companies were built around the concept of using rapidly
changing data and did not probably face the challenge of integrating the new and unstructured data with the
already available ones. If we look at the challenges regarding big data being faced by the online firms and the
start-ups we can highlight the following:
i. Volume: The largeness of the data available made it a challenge as it was neither possible nor efficient to
handle such a large volume of data using traditional databases.
ABSTRACT: This paper is an effort to present the basic understanding of BIG DATA and
HADOOP and its usefulness to an organization from the performance perspective. Along-with the
introduction of BIG DATA, the important parameters and attributes that make this emerging concept
attractive to organizations has been highlighted. The paper also evaluates the difference in the
challenges faced by a small organization as compared to a medium or large scale operation and
therefore the differences in their approach and treatment of BIG DATA. As Hadoop is a Substantial
scale, open source programming system committed to adaptable, disseminated, information
concentrated processing. A number of application examples of implementation of BIG DATA across
industries varying in strategy, product and processes have been presented. This paper also deals
with the technology aspects of BIG DATA for its implementation in organizations. Since HADOOP has
emerged as a popular tool for BIG DATA implementation. Map reduce is a programming structure for
effectively composing requisitions which prepare boundless measures of information (multi-terabyte
information sets) in- parallel on extensive bunches of merchandise fittings in a dependable,
shortcoming tolerant way. A Map reduce skeleton comprises of two parts. They are “mapper" and
"reducer" which have been examined in this paper. The paper deals with the overall architecture of
HADOOP along with the details of its various components in Big Data.
Keywords – Big data, Hadoop, Analytic databases Framework, HDFS, Map Reduce.
2. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 35|
ii. Variety: As compared to the earlier versions, where data was available in one or two forms (possibly text
and tables), and the current versions would mean data being available additionally in the form of pictures,
videos, tweets etc.
iii. Velocity: Increasing use of the online space meant that the data that was available was rapidly changing
and therefore had to be made available and used at the right time to be effective.
The MapReduce is a programming model designed for processing immensely colossal volumes of data
in parallel by dividing the work into a set of independent tasks. MapReduce programs are invited in a particular
style influenced by functional programming constructs, categorically idioms for processing lists of data. A
MapReduce program is possessed of a “Map()” procedure that executes filtering and sorting (for example
sorting people by first name into queues, one queue for each one name )and a “Reduce()” procedure that
implements a synopsis operation. The "MapReduce Framework" (likewise called "infrastructure” or
"framework") organizes by assembling the distributed servers, running the various tasks in parallel, controlling
all communications and data transfers between the numerous parts of the framework, and accommodating for
redundancy and fault tolerance. A prevalent open- source requisition is Apache Hadoop.
II. Big Data
2.1 The Challenges for Big Firms:
Big data may be new for startups and for online firms, but many large firms view it as something they
have been wrestling with for a while. Some managers appreciate the innovative nature of big data, but more find
it “business as usual” or part of a continuing evolution toward more data. They have been adding new forms of
data to their systems and models for many years, and don’t see anything revolutionary about big data. Put
another way, many were pursuing big data before big data was big.
When these managers in large firms are impressed by big data, it’s not the “bigness” that impresses
them. Instead it’s one of three other aspects of big data: the lack of structure, the opportunities presented, and
low cost of the technologies involved. This is consistent with the results from a survey of more than fifty large
companies by New Vantage Partners in2012. It found, according to the survey summary:
2.2 It’s About Variety, Not Volume:
The survey indicates companies are focused on the variety of data, not its volume, both today and in
three years. The most important goal and potential reward of Big Data initiatives is the ability to analyze diverse
data sources.
Application areas and implementation examples:
1. Big Data for cost reduction: Some organizations that are pursuing big data believe strongly that for the
storage of large data that is structured, Big data technologies like Hadoop clusters are very cost effective
solutions that can be efficiently utilized for cost reduction. One company’s cost comparison, for example,
estimated that the cost of storing one terabyte for a year was $37,000 for a traditional relational database,
$5,000 for a database appliance, and only $2,000 for a Hadoop cluster.1 Of course, these figures are not
directly comparable, in that the more traditional technologies may be somewhat more reliable and easily
managed. Data security approaches, for example, are not yet fully developed in the Hadoop cluster environment.
2.3 Big Data at Ups:
UPS is no stranger to big data, having begun to capture and track a variety of package movements and
transactions as early as the 1980s. The company now tracks data on 16.3 million packages per day for 8.8
million customers, with an average of 39.5 million tracking requests from customers per day. The company
stores over 16 peta-bytes of data.
Much of its recently acquired big data, however, comes from telematic sensors in over 46,000 vehicles.
The data on UPS package cars (trucks), for example, includes their speed, direction, braking, and drive train
performance. The data is not only used to monitor daily performance, but to drive a major redesign of UPS
drivers’ route structures. This initiative, called ORION (On-Road Integrated Optimization and Navigation),
is arguably the world’s largest operations research project. It also relies heavily on online map data, and will
eventually reconfigure a driver’s pickups and drop-offs in real time. The project has already led to savings in
2011 of more than 8.4 million gallons of fuel by cutting 85 million miles off of daily routes. UPS estimates that
saving only one daily mile driven per driver saves the company $30 million, so the overall dollar savings are
substantial. The company is also attempting to use data and analytics to optimize the efficiency of its 2000
aircraft flights per day.
3. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 36|
2.4 Big Data for Time Reduction:
The second common objective of big data technologies and solutions is time reduction. Macy’s
merchandise pricing optimization application provides a classic example of reducing the cycle time for
complex and large-scale analytical calculations from hours or even days to minutes or seconds. The department
store chain has been able to reduce the time to optimize pricing of its 73 million items for sale from over 27
hours to just over 1 hour. Described by some as “big data analytics,” this capability set obviously makes it
possible for Macy’s to re-price items much more frequently to adapt to changing conditions in the retail
marketplace. This big data analytics application takes data out of a Hadoop cluster and puts it into other parallel
computing and in-memory software architectures. Macy’s also says it achieved 70% hardware cost reductions.
Kerem Tomak, VP of Analytics at Macys.com, is using similar approaches to time reduction for
marketing offers to Macy’s customers. He notes that the company can run a lot more models with this time
savings.
2.5 Big Data for Improving Process Efficiency:
Big data can be used for improving the process efficiency also. An excellent use of big data in this
regard is cricket especially with the advent of the Indian Premier League (IPL). Not only are matches analyzed
using the data available in order to formulate future strategies but even minute details like the performance of a
bowler against a particular batsman and that too on a particular ground under certain conditions are being made
available for the stakeholders to improve their efficiency.
For example, how will a batsman like Glenn Maxwell perform against a bowler like Sunil Narine at
Eden Gardens or how different will it be at Mohali in Chandigarh is available to be used? Not only this but also
data like how many balls has a particular batsman faced against a particular bowler the number of dot balls
and the number of runs scored. Another example in this regard is the use of Big data to predict the
probability (at any time during a match) of a team winning or losing in a match based on the extrapolation of the
results in similar match situations.
III. Hadoop As An Open Source Tool For Big Data Analytics
Hadoop is a distributed software solution. It is a scalable fault tolerant distributed system for data
storage and processing. There are two main components in Hadoop:
(i) HDFS (which is a storage)
(ii)Map Reduce (which is retrieval and processing): So HDFS is high bandwidth cluster storage and it of great
use what is happening here is (Fig. 1)
We put a pent byte file on our Hadoop cluster, HDFS is going to breakup into blocks and then
distributed it to across all of the nodes of our cluster and on top of that we have a fault tolerant concept what is
done here is HDFS is configure Replication Factor (which is by default set to 3). What does this mean we put
our file on hadoop it is going to make sure that it has 3 copy of every block that make up that file spread across
all the node in our cluster .It is very useful and important because if we lose a node it has a self-feel what
data was there on node and I am going to replicate that blocks that were on that node. The question arise how it
does that for this It has a name node and a data node generally one name node per cluster but essentially name
node is a meta data server it just hold in memory the location of every block and every node and even if you
have multiple rack setup it will know where block exist and what racks across the cluster inside in your network
that’s the secret behind HDFS and we get data.
Map Reduce: Now how we get data is through Map Reduce as name implies it is a two-step process.
There is a Mapper and Reducer programmers will write the mapper function which will go out and tell the
cluster what data point we want to retrieve. The Reducer will then take all of the data and aggregate.
Hadoop is a batch processing here we are working on all the data on cluster, so we can say that Map
Reduce is working on all of data inside our clusters. There is a myth that one need to be understand java to get
completely out of clusters, in fact the engineers of Facebook built a subproject called HIVE which is sql
interpreter. Facebook wants a lot of people to write adhoc jobs against their cluster and they are not forcing
people to learn java that is why team of Facebook has built HIVE, now anybody who is familiar with sql can
pull out data from cluster.
Pig is another one built by yahoo, it’s a high level data flow language to pull data out of clusters and
now Pig and hive are under the Hadoop Map Reduce job submitted to cluster. This the beauty of open source
framework people can built, add and community keeps on growing in Hadoop more technologies and projects
are added into Hadoop ecosystem.
4. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 37|
3.1 MAPREDUCE FRAMEWORK
The MapReduce framework consists of two steps namely Map step and reduce step. Master node takes
large problem input and slices it into smaller sub problems and distributes these to worker nodes. Worker node
may do this again and leads to a multi-level tree structure .Worker processes smaller problem and hands back to
master. In Reduce step Master node takes the answers to the sub problems and combines them in a predefined
way to get the output/answer to original problem. The MapReduce framework is fault-tolerant because
each node in the cluster is expected to report back periodically with completed work and status updates. If a
node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to
other nodes.
3.2 WORKFLOW IN MAPREDUCE
The key to how MapReduce works is to take input as, conceptually, a list of records. The records are
split among the different computers in the cluster by Map. The result of the Map computation is a list of
key/value pairs. Reducer then takes each set of values that has the same key and combines them into a single
value. So Map takes a set of data chunks and produces key/value pairs and Reduce merges things, so that instead
of a set of key/value pair sets, you get one result. You can't tell whether the job was split into 100 pieces or 2
pieces. MapReduce isn't intended to replace relational databases. It’s intended is to provide a light weight way
of programming things so that they can run fast by running in parallel on a lot of machines.
Fig. 1 Computation of MapReduce
MapReduce is important because it allows ordinary developers to use MapReduce library routines to
create parallel programs without having to worry about programming for intra-cluster communication, task
monitoring or failure handling. It is useful for tasks such as data mining, log file analysis, financial analysis
and scientific simulations. Several implementations of MapReduce are available in a variety of programming
languages, including Java, C++, Python, Perl, Ruby, and C. Typical Hadoop cluster integrates MapReduce and
HFDS with master / slave architecture which consists of a Master node and multiple slave nodes. Master node
contains Job tracker node (MapReduce layer), Task tracker node (MapReduce layer), Name node
(HFDS layer),Data node (HFDS layer). Multiple slave nodes are Task tracker node (MapReduce layer) and
Data node (HFDS layer). MapReduce layer has job and task tracker nodes while HFDS layer has name and
data nodes.
Fig. 2 Layers in MapReduce
5. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 38|
Although the Hadoop framework is implemented in Java TM, MapReduce applications need not be
written in Java. Hadoop Streaming is a utility which allows users to create and run jobs with any executable
(e.g. Shell utilities) as the mapper and/or the reducer. Hadoop Pipes is a SWIG-compatible C++ API to
implement MapReduce applications (non JNITM based).
3.4 DATA ORGANIZATION:
In many organizations, Hadoop and other MapReduce solutions are only the examples in the larger data
analysis platform. Data will typically have to be translated in order to interface perfectly with the other
organizations. Similarly, the data might have to be transmuted from its original state to a new state to clear
analysis in MapReduce easier.
3.4.1 NATURE OF DATA:
MapReduce systems such as Hadoop aren’t being utilized exactly for text analysis anymore. An
increasing number of users are deploying MapReduce jobs that analyze data once thought to be excessively
difficult for the paradigm. One of the most obvious trends in the nature of data is the boost of image, audio,
and video analysis. This kind of data is a serious prospect for a distributed system using MapReduce because
these files are typically very big. Retailers want to examine their security video to detect what stores are most
engaged. Medical imaging analysis is becoming harder with the astronomical resolutions of the image. Videos
have colored pixels that change over time, laid out a flat grid. The data are analyzed in order by challenging to
take a look at 10-pixel by 10-pixel by 5-second section of video and audio as a “record.” As multidimensional
data increases in popularity, there are more patterns showing how to logically separate the data into records and
input splits properly. For example, SciDB, an open- source analytical database, is specifically built to deal with
multi-dimensional data. MapReduce is traditionally a batch analytics system, but streaming analytics feels like a
natural onward motion. In many production MapReduce systems, data are always streaming in and then gets
treated in batch on an interval. For instance, data from web server logs are streaming in, but the MapReduce job
is only done every hour. This is inconvenient for a few reasons. First, processing an hour’s worth of data at once
can strain resources. Novel systems that deal with streaming data in Hadoop have cropped up, most notably
the commercial product like HStreaming and the open-source Storm platform, recently released by Twitter.
3.5 SOME ESSENTIAL HADOOP PROJECTS:
Data Access: The reason why we need more way to access data inside Hadoop is because not everyone is low
level, Java, C or C++ programmer that can write Map Reduce Jobs to get the data and even if you are something
what we do in SQL like grouping, aggregating, joining which a challenging job for anybody even if you are a
professional. So we got some data access library. Pig is one among them. Pig is just a high level flow scripting
language. It is really very easy to learn and hang-up. It does not have lot of keywords in it. It is getting a data,
loading a data, filtering up, transforming the data and either returning and storing those results. There are 2 core
components of PIG:
Pig Latin: This is a programming language.
Pig Runtime: which competes pig Latin and converts it into map reduce job to submit to cluster.
Hive: is another Data access project extremely popular like pig. Hive is a way to project structures on to data
inside a cluster so it is really a database. Data-warehouse built on top of Hadoop and it contains a query
language Hive QL like SQL to hive query language and it is extremely similar to SQL. Hive is similar thing like
pig. It converts these queries into map reduce jobs that gets submitted to cluster.
Moving Down to Technology stack we have:
Data storage: Remember out of the box is a batch processing system. We put the data into HDFS system; once
we read in many time or what if we needed to get specific data; What if we want to do real time processing
system on top of Hadoop data and that’s why we have some of the column oriented database known as Hbase
these are just Appache projects but there a buzz term for this NoSQL. Not once SQL that wants it stands for
does not mean you can’t use sql like language to get data out. What it means the underlying structure of the
database are not strict like they are in relational world very loose, very flexible which makes them very scalable
: that’s what we need in the world of Big data and Hadoop, In fact those are lot of NoSQL database platform out
here. One of the most popular is Mangodb.
Mangodb is extremely very popular, especially among programmers because it is really very easy to
work with. It is document style storage model which means programmers can take data models and clone. We
call objects in those applications and serialize them right into Mangodb and with the same ease can bring them
back into application .
6. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 39|
Hbase was based on Google Big table, which is a way we can create table which contains millions of
rows and we can put indexes on them and can do serious data analysis and Hbase is data analysis we put
indexing on them and go to high performance which seeks to find data which we are looking for and nice thing
about Hbase is pig and hive natively agree with Hbase. So you write pig and hive queries against data sitting.
Cassandra is designed to handle large amount of data across many commodity servers, providing high
availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data
centers. It has its root in Amazon with more data storage table and it is designed for real time interactive
transaction processing on top of our hadoop cluster. So both of them solve different problems but they both
require seeking against our Hadoop data.
We also have random collection of projects that span of different categories some of these solve
specific business problems and some of them are little more generic like.
Lucent: Lucent is there for full text searching an API loaded with algorithms to do things like standard full text
searching, wild card searching, phrase searching, range searching kind of stuff.
Hama: Hama is there for BSP (Book Synchronous Processing). Here we need to work with large amount of
scientific data.
Hcatalog: It is known as metadata table and storage management system. What does it mean it’s a way to create
a shared schema, it’s a way for tools like Pig, Hive for interoperable also to have consistent view of data across
those tools.
Avro: These are data serialization technology which is a way in which we can take data from application,
package up into a format that we can either store in our disk or send across the wires so that another application
can unpack it and desterilize into a format that they can understand. Avro is more generic
Thrift is more specific for creating flexible schemas that work with hadoop data. Its specific because it is meant
for cross- language compatibility, so we build an application with hadoop data in java and if we want to use
same object inside an application that you built on Ruby, Python or C++.
Crunch: Crunch is there for writing and testing and running map reduce pipeline. It essentially gives you full
control to overall four phrases, which is going to be: 1. Map, 2. Reduce, 3. Shuffle, and 4. Combine. It is there to
help in joining and aggregation that is very hard to do in low level map reduce, so Crunch is there to make you
little more easier inside map reduce pipeline.
Data Intelligence: We also have data intelligence in the form of Drill and Mahout.
Drill: Drill is actually an incubator project and is designed to do interactive analysis on nested data.
Mahout: Mahout is a machine learning library that concurs the three Cs:
1. Recommendation (Collaborative Filtering)
2. Clustering (which is a way to group related text and documents)
3. Classification (which is a way to categorize related text and documents).
So Amazon uses all this stuff to a further recommendation like music sites uses to recommend songs you listen
and also to do predictive analysis.
Sqoop: On the left side of Figure 2. We have Sqoop. It is a widely popular project because it is easy to integrate
hadoop with relational systems. For instance, we have result of map reduce. Rather than taking these results and
putting them on HDFS, and require Pig and Hive query, we can send those results to relational world so that a
data professional can do their own analysis and become the part of process. So, Sqoop is popular for pushing
hadoop data into relational world, but it is also popular for pushing data from relational world into hadoop, like
archiving.
Flume and Chukwa: are real time log processing tools so that we can set up our frame-work where our
applications, operating systems, services like web-services that generate mountains of log data. It’s a way we
can push real time data information right into hadoop and we can also do real time analysis.
Over the right hand side of Figure 3. We have a tool for managing, monitoring and orchestrating all the things
that go in our cluster:
7. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 40|
Oozie: It is a work flow library that allows us to play, and to connect lot of those essential projects for instances,
Pig, Hive and Sqoop.
Zoo Keeper: It is a distributed service coordinate. So it’s a way in which we keep our all services running
across our entire cluster synchronous. So, it handles all synchronization and serialization. It also gives
centralized management for these services.
Ambari: It allows you to provision a cluster which means that we can install services, so that we can pick Pig,
Hive, Sqoop, Hbase and install it. It will go across all the nodes in cluster and also we can manage our services
from one centralized location like starting up, stopping, reconfiguring and we can also monitor a lot of these
projects from Ambari.
Figure 3. The image shows the hadoop technology stack. The hadoop core/common which consists of
HDFS (Distributed Storage) which is a programmable interface to access stored data in cluster.
3.6 YARN (Yet another resource negotiation)
It’s a Map Reduce version 2. This is future stuff. This is stuff which is currently alpha and yet to
come. It is rewrite of Map
Reduce 1.
IV. Inputs And Outputs
The MapReduce framework operates exclusively on <key, value> pairs, that is, the
framework views the input to the job as a set of <key, value> pairs[15] and produces a output of the job
as set of <key, value> pairs conceivably of distinct types. The key and value classes have to be serializable by
the framework and hence need to implement the writable interface. Additionally, the key classes h a v e t o
i m p l e m e n t t h e Writable Comparable interface to facilitate sorting by the framework.
4.1 I/O TYPES OF A MAPREDUCE JOB:
Fig.4 Distribution of Input data
8. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 41|
A simple MapReduce program can be written to determine how many times different words
appear in a set of files for example if we the files like file1, file2, and file 3.
Input:
File1: Deer, Bear, River File2: Car, Car, River File3: Deer, Car, Bear
We can a write a program in map reduce by using three operations like map, combine, reduce to compute the
output.
The first step is Map Step:
First Map Second Map Third Map
<Deer,1> <Car,1> <Deer,1>
<Bear,1> <Car,1> <Car,1>
<River,1> <River,1> <Bear,1>
The secondary step is Combine Step:
<Bear,1> <Car,1> <Deer,1> <River,1>
<Beer,1> <Car,1> <Deer,1> <River,1>
<Car,1>
The final step is Reduce Step:
<Beer,2> <Car,3> <Deer,2> <River,2>
Fig.5 Example showing MapReduce Job
4.2 TASK EXECUTION AND ENVIRONMENT
Task Tracker executes Mapper/Reducer task as a child process in a separate JVM (Java Virtual
Machine). The Child task inherits the environment of the parent Task Tracker. A User can specify
environmental variables controlling memory, parallel computation settings, segment size. A requirement of
applications using MapReduce specifies the Job configuration, input/output locations. It supply map and reduce
functions via implementations of appropriate interfaces and/or abstract classes.
4.3 SCHEDULING
Usually, Hadoop uses FIFO to schedule jobs. The scheduler option depends on capacity and fair. Jobs
are submitted to the queue according to their priority. Queues are allocated according to the resources capacity.
Free resources are allocated to queues away from their total capacity.
V. Conclusion
From the topics discussed in detail we get to know the concept of Hadoop, Map Reduce and other
variables in Big Data Analysis. Doug Cutting, Cloudera's chief architect, helped create Apache Hadoop
out of necessity as data from the web exploded and grew far beyond the ability of traditional systems to handle
it. Hadoop was initially inspired by papers published by Google outlining its approach to handling an avalanche
of data, and has since become the standard for storing, processing and analyzing hundreds of terabytes and even
9. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 42|
petabytes of data. Hadoop MapReduce is a broad scale, open source software framework devoted to scalable,
distributed, data-intensive computing. The MapReduce framework breaks up large data into smaller
parallelizable chunks and handles scheduling. If you can rewrite algorithms into Maps and Reduces, and
your problem can be broken up into small pieces solvable in parallel, then Hadoop’s MapReduce is the way to
go for a distributed problem solving approach to large datasets. Map reduce framework is Fault tolerant,
decisive and supports thousands of nodes and petabytes of data. The future trend in big data is Apache Hadoop2.
It is the second iteration of the Hadoop framework for distributed data processing. And in today’s hyper-
connected world where more and more data is being created every day, Hadoop’s breakthrough advantages
mean that businesses and organizations can now find value in data that was recently considered useless.
REFERENCES
[1] M. A. Beyer and D. Laney, “The importance of ‟ big data‟ : A definition,” Gartner, Tech. Rep., 2012.
[2] X. Wu, X. Zhu, G. Q. Wu, et al., “Data mining with big data,” IEEE Trans. on Knowledge and Data Engineering,
vol. 26, no. 1, pp. 97-107, January 2014.Rajaraman and J. D. Ullman, “Mining of massive datasets,” Cambridge
University Press, 2012.
[3] Z. Zheng, J. Zhu, M. R. Lyu. “Service-generated Big Data and Big Data-as-a-Service: An Overview,” in Proc. IEEE
BigData, pp. 403-410, October 2013. A . Bellogín, I. Cantador, F. Díez, et al., “An empirical comparison of social,
collaborative filtering, and hybrid recommenders,” ACM Trans. on Intelligent Systems and Technology, vol. 4, no.
1, pp. 1-37, January 2013.
[4] W. Zeng, M. S. Shang, Q. M. Zhang, et al., “Can Dissimilar Users Contribute to Accuracy and Diversity of
Personalized Recommendation?,” International Journal of Modern Physics C, vol. 21, no.10, pp. 1217- 1227, June
2010.
[5] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami, “Fuzzy c-Means Algorithms for Very Large
Data,” IEEE Trans. on Fuzzy Systems, vol. 20, no.6, pp. 1130-1146, December 2012.
[6] "HDFS Users Guide - Rack Awareness". Hadoop.apache.org. Retrieved 2013-10 17.
[7] "Cloud analytics: Do we really need to reinvent the storage stack?". IBM. June 2009.
[8] "HADOOP-6330: Integrating IBM General Parallel File System implementation of Hadoop File system
interface". IBM. 2009-10-23.
[9] "Refactor the scheduler out of the JobTracker". Hadoop Common. Apache Soft ware Foundation. Retrieved 9
June 2012.
[10] M. Tim Jones (6 December 2011). "Scheduling in Hadoop". ibm.com. IBM. Retrieved 20 November 2013.
[11] "Under the Hood: Hadoop Distributed File system reliability with Namenode and Avatarnode". Facebook. Retrieved
2012-09-13.
[12] "Under the Hood: Scheduling MapReduce jobs more efficiently with Corona". Facebook. Retrieved 2012-11-9.
[13] "Zettaset Launches Version 4 of Big Data Management Solution, Delivering New Stability for Hadoop Systems
and Productivity Boosting Features | | Zettaset.comZettaset.com". Zettaset.com. 2011-12-06. Retrieved 2012-05-23.
[14] Curt Monash. "More patent nonsense — Google MapReduce". dbms2.com. Retrieved 2010-03-07.
[15] D. Wegener, M. Mock, D. Adranale, and S. Wrobel, “Toolkit-Based High-Performance Data Mining of Large Data
on MapReduce Clusters,” Proc. Int’l Conf. Data Mining Workshops (ICDMW ’09), pp. 296-301, 2009
[16] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” in Proceedings of the 6th
conference on Symposium on Opearting Systems Design & Implementation - Volume 6, ser. OSDI’04.
[17] http://www. slideshare.net/mcsrivas/design-scale-and- performance-of-ma prs-distribution-for-hadoop
[18] Z. Liu, P. Li, Y. Zheng, et al., “Clustering to find exemplar terms for keyphrase extraction,” in Proc. 2009
Conf. on Empirical Methods in Natural Language Processing, pp. 257-266, May 2009.
[19] X. Liu, G. Huang, and H. Mei, “Discovering homogeneous web service community in the user-centric
web environment,” IEEE Trans. on Services Computing, vol. 2, no. 2, pp. 167-181, April-June 2009.
[20] K. Zielinnski, T. Szydlo, R. Szymacha, et al., “Adaptive so a solution stack,” IEEE Trans. on Services Computing,
vol. 5, no. 2, pp. 149-163, April-June 2012.
[21] F. Chang, J. Dean, S. mawat, et al., “Bigtable: A distributed storage system for structured data,” ACM Trans.
on Computer Systems, vol. 26, no. 2, pp. 1-39, June 2008.
[22] Z. Zheng, H. Ma, M. R. Lyu, et al., “QoS-aware Web service recommendation by collaborative filtering,” IEEE
Trans. on Services Computing, vol. 4, no. 2, pp.140-152, February 2011.
[23] R. S. Sandeep, C. Vinay, S. M. Hemant, “Strength and Accuracy Analysis of Affix Removal Stemming
Algorithms,” International Journal of Computer Science and Information Technologies, vol. 4, no. 2, pp. 265-269,
April 2013.
[24] V. Gupta, G. S. Lehal, “A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages,”
Journal of Emerging Technologies in Web Intelligence, vol. 5, no. 2, pp. 157-161, May 2013.A. Rodriguez, W. A.
Chaovalitwongse, L. Zhe L, et al., “Master defect record retrieval using network-based feature association,” IEEE
Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 40, no. 3, pp. 319-329, October
2010.
[25] T. Niknam, E. Taherian Fard, N. Pourjafarian, et al., “An efficient algorithm based on modified imperialist
competitive algorithm and K-means for data clustering,” Engineering Applications of Artificial Intelligence, vol 24,
no. 2, pp. 306-317, March 2011.
10. Influence of Hadoop in Big Data Analysis and Its Aspects
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 5 | Iss.2| Feb. 2015 | 43|
[26] M. J. Li, M. K. Ng, Y. M. Cheung, et al. “Agglomerative fuzzy k-means clustering algorithm with selection of
number of clusters,” IEEE Trans. on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1519-1534,
November 2008.
[27] G. Thilagavathi, D. Srivaishnavi, N. Aparna, et al., “A Survey on Efficient Hierarchical Algorithm used in
Clustering,” International Journal of Engineering, vol. 2, no. 9, September 2013.
[28] C. Platzer, F. Rosenberg, and S. Dustdar, “Web service clustering using multidimensional angles as proximity
measures,” ACM Trans. on Internet Technology, vol. 9, no. 3, pp. 11:1-11:26, July, 2009.
[29] G. Adomavicius, and J. Zhang, “Stability of Recommendation Algorithms,” ACM Trans. on Information
Systems, vol. 30, no. 4, pp. 23:1-23:31, August 2012.
[30] J. Herlocker, J. A. Konstan, and J. Riedl, “An empirical analysis of design choices in neighborhood-based
collaborative filtering algorithms,” Information retrieval, vol. 5, no. 4, pp. 287-310, October 2002.
[31] Yamashita, H. Kawamura and K. Suzuki, “Adaptive Fusion Method for User-based and Item-based Collaborative
Filtering,” Advances in Complex Systems, vol. 14, no. 2, pp. 133-149, May 2011.
[32] D. Julie and K. A. Kumar, “Optimal Web Service Selection Scheme With Dynamic QoS Property Assignment,”
International Journal of Advanced Research In Technology, vol. 2, no. 2, pp. 69-75, May2012.
[33] J. Wu, L. Chen, Y. Feng, et al., “Predicting quality of service for selection by neighborhood-based collaborative
filtering,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, vol. 43, no. 2, pp. 428-439, March 2013.
[34] White, Tom (10 May 2012). Hadoop: The Definitive Guide. O'Reilly Media. p. 3. ISBN 978-1-4493-3877-0.
[35] "Applications and organizations using Hadoop". Wiki.apache.org. 2013-06-19. Retrieved 2013-10-17.
[36] "HDFS User Guide". Hadoop.apache.org. Retrieved 2012-05-23.
[37] "HDFS Architecture". Retrieved 1 September 2013.
[38] "Improving MapReduce performance through data placement in heterogeneous Hadoop Clusters"
(PDF). Eng.auburn.ed. April 2010.
[39] Y. Zhao, G. Karypis, and U. Fayyad, “Hierarchical clustering algorithms for document datasets,” Data
Mining and Knowledge Discovery, vol. 10, no. 2, pp.141-168, November 2005.