This document provides an overview of a course on fundamentals of big data. It outlines 5 course outcomes related to identifying big data characteristics, implementing Hadoop and MapReduce, analyzing structured and semi-structured data using Hive and Pig, using MongoDB for CRUD operations, and exploring visualization techniques. It also describes 2 course units, the first of which covers characteristics of big data like volume, velocity, and variety, as well as challenges. Traditional business intelligence is contrasted with big data approaches.
BUS105Business Information SystemsWorkshop Week 3.docxjasoninnes20
BUS105
Business Information
Systems
Workshop Week 3
Small and big Data Collection, Storage
and Management in Relation to
Information Systems
Copyright Notice
COPYRIGHT
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan Higher
Education pursuant to Part VB of the Copyright Act 1968 (the Act). The material in
this communication may be subject to copyright under the Act. Any further reproduction
or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice
2
Lesson Learning Outcomes
1 Review different types of data
2 Contrast small and big data collection
3 Learn about data storage and management
4 Examine business case studies in relation to
the type of data requirements for particular
information systems
Splunk: Slicing Data for
Domino’s Pizza
• Watch the video on how Splunk is helping to improve
Domino’s business functions
https://www.youtube.com/watch?v=LXMjN6kVmUY
Q: What was the big event
that occurred in the US that
required many pizza orders?
https://www.youtube.com/watch?v=LXMjN6kVmUY
• Raw data (primary data)
– Numbers, words, symbols collected from a source
– Not cleaned or processed
– may have errors or outliers
• Metadata
– Data that provides information about other data
– “Metadata explains the origin, purpose, time, geographic
location, creator, access, and terms of use of the data.”
https://data.library.arizona.edu/data-management-tips/data-documentation-and-metadata
Glossary 1
LO1
https://data.library.arizona.edu/data-management-tips/data-documentation-and-metadata
• Metadata from a pdf file
Metadata Example
Glossary 2
LO1
• Structured data is formatted for use, has a well-defined data
structure, generally stored in rows and columns
- e.g. age (in years), first name (text), address (text),
income ($), etc. We will learn more about this in the
relational database section of the slides.
• Semi-structured data has some structure
- e.g. CSV files with comma separated data. XML and
JavaScript Object Notation, JSON, documents used to
exchange data to/from a web server
• Parse means to analyse (a string or text) into logical syntactic
components.
EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley &
Sons, Indianapolis, US.
https://www.google.com/search?q=parsing+definition&ie=&oe=
https://en.wikipedia.org/wiki/JSON
https://www.google.com/search?q=parsing+definition&ie=&oe
Glossary 3
LO1
• Quasi-structured data textual data which has various
formats and takes effort to handle and analyse
– e.g. web clickstream data
• Unstructured data has no predefined data model, not
organised, may have multiple types of data
- e.g. data from thermostats, sensors, home electronic
devices, cars, images and soun ...
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
BUS105Business Information SystemsWorkshop Week 3.docxjasoninnes20
BUS105
Business Information
Systems
Workshop Week 3
Small and big Data Collection, Storage
and Management in Relation to
Information Systems
Copyright Notice
COPYRIGHT
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan Higher
Education pursuant to Part VB of the Copyright Act 1968 (the Act). The material in
this communication may be subject to copyright under the Act. Any further reproduction
or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice
2
Lesson Learning Outcomes
1 Review different types of data
2 Contrast small and big data collection
3 Learn about data storage and management
4 Examine business case studies in relation to
the type of data requirements for particular
information systems
Splunk: Slicing Data for
Domino’s Pizza
• Watch the video on how Splunk is helping to improve
Domino’s business functions
https://www.youtube.com/watch?v=LXMjN6kVmUY
Q: What was the big event
that occurred in the US that
required many pizza orders?
https://www.youtube.com/watch?v=LXMjN6kVmUY
• Raw data (primary data)
– Numbers, words, symbols collected from a source
– Not cleaned or processed
– may have errors or outliers
• Metadata
– Data that provides information about other data
– “Metadata explains the origin, purpose, time, geographic
location, creator, access, and terms of use of the data.”
https://data.library.arizona.edu/data-management-tips/data-documentation-and-metadata
Glossary 1
LO1
https://data.library.arizona.edu/data-management-tips/data-documentation-and-metadata
• Metadata from a pdf file
Metadata Example
Glossary 2
LO1
• Structured data is formatted for use, has a well-defined data
structure, generally stored in rows and columns
- e.g. age (in years), first name (text), address (text),
income ($), etc. We will learn more about this in the
relational database section of the slides.
• Semi-structured data has some structure
- e.g. CSV files with comma separated data. XML and
JavaScript Object Notation, JSON, documents used to
exchange data to/from a web server
• Parse means to analyse (a string or text) into logical syntactic
components.
EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley &
Sons, Indianapolis, US.
https://www.google.com/search?q=parsing+definition&ie=&oe=
https://en.wikipedia.org/wiki/JSON
https://www.google.com/search?q=parsing+definition&ie=&oe
Glossary 3
LO1
• Quasi-structured data textual data which has various
formats and takes effort to handle and analyse
– e.g. web clickstream data
• Unstructured data has no predefined data model, not
organised, may have multiple types of data
- e.g. data from thermostats, sensors, home electronic
devices, cars, images and soun ...
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
This is the BIg Data Presentation which I have submitted to the college. Big Data introduction and types of Big Data have been covered in this presentation.
This slide describes what is big data and what all are its sources. It further discusses the problems arose because of it and what is its solution.
In case of any doubts, you can contact me on
"brushupyourskills2201@gmail.com"
Every day we roughly create 2.5 Quintillion bytes of data; 90% of the worlds collected data has been generated only in the last 2 years. In this slide, learn the all about big data
in a simple and easiest way.
This is the BIg Data Presentation which I have submitted to the college. Big Data introduction and types of Big Data have been covered in this presentation.
This slide describes what is big data and what all are its sources. It further discusses the problems arose because of it and what is its solution.
In case of any doubts, you can contact me on
"brushupyourskills2201@gmail.com"
Every day we roughly create 2.5 Quintillion bytes of data; 90% of the worlds collected data has been generated only in the last 2 years. In this slide, learn the all about big data
in a simple and easiest way.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
2. COURSE OUTCOMES
Students will be able to
CO1- Identify the characteristics and challenges of big data
analytics
CO2- Implement the Hadoop and MapReduce framework for
processing massive volume of data
CO3- Analyze the structured and semi structured data using
Hive and Pig
CO4- Implement CRUD operations effectively using MongoDB
and Report generation using Jaspersoft studio
C05- Explore the usage of Hadoop and its integration tools to
manage Big Data and Use Visualization Techniques
3. UNIT-1
Classification of Digital Data - Introduction to Big
Data: Characteristics – Evolution – Definition -
Challenges with Big Data - Other Characteristics of
Data - Why Big Data - Traditional Business Intelligence
versus Big Data - Data Warehouse and Hadoop
Environment.
Total Periods :9
5. Today we are surrounded by data. Such large amount of
data is generated from many sources. consider the
following:
Facebook hosts more than 240 billion photos,
695,000 status updates, growing the data @ of 7
petabytes per month
In Twitter, Every second 98,000+ tweets are posted
A single Jet engine can generate 10+terabytes of
data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches
up to many Petabytes
6. The New York Stock Exchange generates
about 4-5 terabytes of new trade data per day
Every hour Walmart handles more than 1
million customer transactions
Everyday we create 2.5 Quintillion bytes of
data;
90% of the data in the world today has been
created in the last two years alone
7. Introduction to Big Data
What is Data?
• The quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
What is Big Data?
• Big Data is also data but with a huge size.
• Big Data is a term used to describe a collection of data that is huge in
volume and yet growing exponentially with time.
• In short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
9. Characteristics Of Big Data
1.Volume
2.Velocity
3.Variety
4.Veracity
1. Volume:
• Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.
VOLUME= Very Large Amount of Data
10. 2. Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations
or Human Beings or Systems are generating huge amounts of Data at
very fast rate.
VELOCITY=Produce data at very fast rate
3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or
Human Beings or Systems are generating very huge amount of data at
very fast rate in different formats. We will discuss in details about
different formats of Data soon.
VARIETY=Produce data in different formats
11. 4.Veracity
• Veracity means “The Quality or Correctness or Accuracy of Captured
Data”.
• Out of 4Vs, it is most important V for any Big Data Solutions.
• without Correct Information or Data, there is no use of storing large
amount of data at fast rate and different formats. That data should give
correct business value.
12. Types of Digital Data
1.Structured
2.Unstructured
3.Semi-structured
Structured
Any data that can be stored, accessed and processed in the
form of fixed format is termed as a 'structured' data.
we are foreseeing issues when a size of such data grows to a huge
extent, typical sizes are being in the range of multiple zettabytes.
13. • Examples Of Structured Data
An 'Employee' table in a database is an example of Structured Data
Employee_ID Employee_Name Gender Department Salary_In_lacs
2365 Rajesh Kulkarni Male Finance 650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000
14. Unstructured
Any data with unknown form or the structure is classified as unstructured data.
In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or unstructured
format.
• Examples Of Un-structured Data
The output returned by 'Google Search'
15. Semi-structured
Semi-structured data can contain both the forms of data.
We can see semi-structured data as a structured in form but it is actually not
defined with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.
Examples Of Semi-structured Data
Personal data stored in an XML file-
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
18. Business Intelligence vs Big Data
• Traditional BI methodology is based on the principle of
grouping all business data into a central server.
• A Big Data solution differs in many aspects to BI to use.
These are the main differences between
Big Data and Business Intelligence:
• In a Big Data environment, information is stored on a
distributed file system, rather than on a central server.
• It is a much safer and more flexible space.
• Big Data solutions carry the processing functions to the data,
rather than the data to the functions.
19. • Big Data can analyze data in different formats, both structured
and unstructured.
• Big Data solutions solve them by allowing a global analysis of
various sources of information.
• Data processed by Big Data solutions can be historical or come
from real-time sources.
• Big Data technology uses parallel mass processing (MPP)
concepts, which improves the speed of analysis.