SlideShare a Scribd company logo
1 of 78
BUS105
Business Information
Systems
Workshop Week 3
Small and big Data Collection, Storage
and Management in Relation to
Information Systems
Copyright Notice
COPYRIGHT
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by
or on behalf of Kaplan Higher
Education pursuant to Part VB of the Copyright Act 1968 (the
Act). The material in
this communication may be subject to copyright under the Act.
Any further reproduction
or communication of this material by you may be the subject of
copyright protection under the Act.
Do not remove this notice
2
Lesson Learning Outcomes
1 Review different types of data
2 Contrast small and big data collection
3 Learn about data storage and management
4 Examine business case studies in relation to
the type of data requirements for particular
information systems
Splunk: Slicing Data for
Domino’s Pizza
• Watch the video on how Splunk is helping to improve
Domino’s business functions
https://www.youtube.com/watch?v=LXMjN6kVmUY
Q: What was the big event
that occurred in the US that
required many pizza orders?
https://www.youtube.com/watch?v=LXMjN6kVmUY
• Raw data (primary data)
– Numbers, words, symbols collected from a source
– Not cleaned or processed
– may have errors or outliers
• Metadata
– Data that provides information about other data
– “Metadata explains the origin, purpose, time, geographic
location, creator, access, and terms of use of the data.”
https://data.library.arizona.edu/data-management-tips/data-
documentation-and-metadata
Glossary 1
LO1
https://data.library.arizona.edu/data-management-tips/data-
documentation-and-metadata
• Metadata from a pdf file
Metadata Example
Glossary 2
LO1
• Structured data is formatted for use, has a well-defined data
structure, generally stored in rows and columns
- e.g. age (in years), first name (text), address (text),
income ($), etc. We will learn more about this in the
relational database section of the slides.
• Semi-structured data has some structure
- e.g. CSV files with comma separated data. XML and
JavaScript Object Notation, JSON, documents used to
exchange data to/from a web server
• Parse means to analyse (a string or text) into logical syntactic
components.
EMC Education Services (Eds.) 2015, Data Science and Big
Data Analytics: Discovering, Analyzing, Visualizing and
Presenting Data, John Wiley &
Sons, Indianapolis, US.
https://www.google.com/search?q=parsing+definition&ie=&oe=
https://en.wikipedia.org/wiki/JSON
https://www.google.com/search?q=parsing+definition&ie=&oe
Glossary 3
LO1
• Quasi-structured data textual data which has various
formats and takes effort to handle and analyse
– e.g. web clickstream data
• Unstructured data has no predefined data model, not
organised, may have multiple types of data
- e.g. data from thermostats, sensors, home electronic
devices, cars, images and sounds & pdf files.
EMC Education Services (Eds.) 2015, Data Science and Big
Data Analytics: Discovering, Analyzing,
Visualizing and Presenting Data, John Wiley & Sons,
Indianapolis, US.
https://commons.wikimedia.org/wiki/Neod
ythemis_hildebrandti
https://commons.wikimedia.org/wiki/Neodythemis_hildebrandti
Numerical vs Categorical Data
LO1
Data
Numerical
(quantitative)
Discrete: takes numerical
values from counting
Continuous: takes numerical
values from measurements
Categorical
(qualitative)
Nominal : an identifier or label
and has no numerical meaning
Ordinal: categories that can be
ranked (ordered) arbitrarily
Examples of Numerical and
Categorical Data
Data
Numerical
(quantitative)
Discrete: number of chairs in this room
Continuous: height
Categorical
(qualitative)
Nominal: colours, i.e. blue, green, yellow.....
Ordinal: risk, e.g.
1. High risk,
2. Medium risk
3. Low risk
Activity 1: Numerical and
Categorical Data
• Form groups and find more examples of the data types
Data
Numerical
(quantitative)
Discrete:
Continuous:
Categorical
(qualitative)
Nominal:
Ordinal:
• Suppose that you have been employed by bicycle hire
company Citibike to analyse bike trips made by customers
in 2018. Some of the questions you may have are:
• Where do the customers ride most often?
• How far do the customers ride?
• How old, on average, are the customers?
https://www.citibikenyc.com/
Q: What sort of data would
you collect and how much?
Who Wants to Ride Around
New York City?
https://www.citibikenyc.com/
Who Wants to Ride Around
New York City?
This is structured data.
Q: How do you think this customer data is collected?
• We obtained a data set of 12,677 trips taken in January
2018.
• Variables include
• Trip Duration (seconds)
• Start Time and Date
• Stop Time and Date
• Start Station Name
• End Station Name
• Station ID
• Station Lat/Long
• Bike ID
• User Type (Customer = 24-hour pass or 3-day pass user;
Subscriber = Annual Member)
• Gender (Zero=unknown; 1=male; 2=female)
• Year of Birth
https://data.world/citibikenyc/citibike-tripdata-january-2018
Q. What type of variables
are these?
Who Wants to Ride Around
New York City?
https://data.world/citibikenyc/citibike-tripdata-january-2018
Activity 2: Contrast Small and
Big Data
LO2
• Watch the video and list four of the ten ways in
which small and big data differ
• Report back to class
https://www.youtube.com/watch?v=nh-FrpMqlIs
https://www.youtube.com/watch?v=nh-FrpMqlIs
Small Data Summary
LO2
1. Goal: often for a very specific purpose
2. Location: usually stored in one place
3. Structure: more likely to be structured data
4. Data preparation: often handled by a single person
5. Longevity: may only be kept for 7 years
6. Measurements: usually measurements taken by a smaller
group
or one person/machine and are consistent
7. Reproducibility: easier to reproduce
8. Stakes (cost): less expensive
9. Introspection: easier to interpret and data points clearer
10. Analysis: often easier to organise and analyse
Video on content from Jules Berman’s book called Principles of
Big Data: Preparing, Sharing, and Analyzing
Complex Information https://www.youtube.com/watch?v=nh-
FrpMqlIs
https://www.youtube.com/watch?v=nh-FrpMqlIs
Big Data Summary
LO2
1. Goals: one may not know how they are going to use all of
their big data
2. Location: in multiple places (servers)
3. Structure: all types (structured, semi, quasi and unstructured)
4. Data preparation: by several persons
5. Longevity: may be kept for much longer and possibly used
across
different projects, or linked to other data later
6. Measurements: by different persons/machines with different
protocols
7. Reproducibility: more difficult to recover data if something
goes wrong.
8. Stakes (cost): can be expensive
9. Introspection: you may not be able to identify data type or
use
10. Analysis: more complex, e.g. requires extraction,
transformation, etc.
How Business Collects Customer
Big Data
Internally collected as:
• Sales data (transaction history, customer interaction)
• Customer feedback (e.g. Facebook)
Externally collected by:
• Directly asking
• Indirect tracking (emails, apps and third-party trackers,
• Websites, cookies and web beacons
• Adding other data sources to their own by
– purchasing third party data (e.g. from data
companies Acxiom and Oracle)
https://www.itchronicles.com/big-data/how-do-big-companies-
collect-customer-data/
https://marketing.acxiom.com/rs/982-LRE-196/images/Acxiom
UK_Data_Source_Information-Privacy_LATEST.pdf
https://www.oracle.com/index.html
https://www.itchronicles.com/big-data/how-do-big-companies-
collect-customer-data/
Activity 3: Quick Quiz
LO2
1. Big data is usually collected for one specific purpose.
a. True
b. False
2. Small data is usually stored in one place (on one computer or
server).
a. True
b. False
3. The Kaplan Information systems course code BUS105 is a:
a. Continuous numerical variable
b. Ordinal variable
c. Nominal variable
d. Discrete numerical variable
Storage of Data
LO3
• Data Lake
– Repository for large amounts of raw data from multiple
sources and in
many formats, some of which may not be useful
• Data warehouse
– A repository of data from various sources, partially re-
organised, and
used to support decision makers in the organisation
– Takes data from data lake and transforms it
• Data mart
– A low-cost, scaled-down version of a data warehouse designed
for the
end-user needs in a strategic business unit (SBU) or a
department
• Database
– Organised collection of structured data (relational) or specific
Semi-, quasi and unstructured data (non-relational)
Big Data Storage and
Management Options
Top 10 Big Data Storage Companies
https://selecthub.com/big-data-storage-software/
We will learn more about
semi and unstructured data
management in week 8.
https://selecthub.com/big-data-storage-software/
Relational Database Management Systems
• Database management system (DBMS)
– A set of tools to add, delete, access, modify, and analyse
stored data
Relational databases
• Data represented as two-dimensional tables with columns and
rows
Example: Microsoft Excel
Software for storage and finding data: MySQL, Microsoft
Access, Google
Spanner, MemSQL
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
Non-Relational Database Management
Systems
Non-relational databases
• For big data and real-time web data
• Usually open source and work on a distributed (parallel)
data approach
General categories of non-relational databases:
Key-value stores for shopping cart, sensor data
Document stores for tweets, customer data, blog posts
Wide-column stores for time series, banking
Graph stores for networks, social connections
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
https://stackoverflow.com/questions/35281066/neo4j-is-it-
possible-to-visualise-a-simple-overview-of-my-database
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
https://stackoverflow.com/questions/35281066/neo4j-is-it-
possible-to-visualise-a-simple-overview-of-my-database
Non-relational databases
NoSQL databases:
• Store data in a non-tabular for,
e.g. MongoDB (JSON), Neo4j, HBASE
XML databases:
• Have an XML format,
e.g. Oracle Berkeley DB XML, eXist-db, BaseX
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
https://stackoverflow.com/questions/35281066/neo4j-is-it-
possible-to-visualise-a-simple-overview-of-my-database
Non-Relational Database Management
Systems Cont.
http://bigdata-madesimple.com/relational-vs-non-relational-
databases-part-1/
https://stackoverflow.com/questions/35281066/neo4j-is-it-
possible-to-visualise-a-simple-overview-of-my-database
Query Languages
• Query languages request information from databases.
• Querying language and method used depends on the
database used.
• The oldest query language is structured query language
(SQL) for relational databases.
– SQL does complicated searches using simple key
words, e.g.
• SELECT (specifies a desired attribute)
• FROM (specifies the table to be used)
• WHERE (specifies conditions to apply in the query)
Other types: UnQL for noSQL databases
• Xquery, XQL for XML databases
Activity 4: Review Quiz
Q1: SQL stands for:
a. Sequence query language
b. Structured query language
c. Semi query language
d. Social query language
Q2: Would you use a data mart across a large organisation or
just in a
department?
Q3: MongoDB is a
a. Relational database
b. Table
c. XML database
d. NoSQL database using JSON
Data Governance
• Data governance:
– The policies and processes for managing data and information
across an entire organisation for a specified time.
• Master data management
– How and where data is managed and maintained for the entire
organisation
• Roles and responsibilities
– Staff in charge of making policies and managing data
Example (see next slide)
• Cancer Institute NSW data governance policy
Master data
management
Roles and
responsibilities
http://databaseanswers.org/downloads/Data_Governance_by_Ex
ample.pdf
Data
governance
https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c-
af45-ee4716a4066b/CINSW-data-governance-policy.PDF
http://databaseanswers.org/downloads/Data_Governance_by_Ex
ample.pdf
Case Study: Cancer Institute NSW
Data Governance
• Extract from page 6 of the policy document
https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c-
af45-ee4716a4066b/CINSW-data-governance-policy.PDF
https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c-
af45-ee4716a4066b/CINSW-data-governance-policy.PDF
Data Management Summary
LO3
Data management is how you:
– Organise, structure, and maintain the data
– Store, back up, and preserve data
– Prepare material for analysis, or to share with others
This Photo by Unknown Author is licensed under CC BY
Management is part of
governance (hence the
overlap)
http://archive.edrm.net/resources/edrm-white-paper-series/igrm-
garp
https://creativecommons.org/licenses/by/3.0/
Activity 5: Data Governance
• Form groups, watch the video on data governance
and answer the questions below.
https://www.youtube.com/watch?v=t4IOS5csv40
Q1: Definite data governance. Why do we need it?
Q2: What keywords came up in the video in relation to
data governance?
Q3: What are the three key components of data
governance? Can you explain them in your own words?
https://www.youtube.com/watch?v=t4IOS5csv40
Data Documentation
• Data documentation is important for transparency.
• Methods include data dictionaries, schema, metadata
A data dictionary is a reference (document) of the
variables in a database.
– Defines the format necessary to enter the data into
the database, i.e. ranges, codes, decimal places
– Creates standard definitions for all attributes
– Provides organisational data resource inventory for
effective data management
Creating a Data Dictionary
Watch the video on creating a data dictionary.
https://www.youtube.com/watch?v=AeVJy-ow2b0
Do you understand these basic elements now?
Field name
Field size
Data type
Data format
Description
Example (optional)
See activity on next slide
https://www.youtube.com/watch?v=AeVJy-ow2b0
Activity 6: Create a Simple Data
Dictionary for the Citibike Data
• Form a group
• Download the file ‘JC-201801-citibike-tripdata.xlsx’
• As a group, construct a simple data dictionary for at least
four variables in the Citibike data
• Report back to class
Case Study: H&R Block Partner
With Xero
LO3
• The video shows how H&R Block has adopted
Xero to customise service, given customer tax
data
• Click on link: Xero
• Xero partners dominate nominations for the
Australian Accounting Awards 2019
This Photo by Unknown Author is licensed under CC BY-SA
https://tv.xero.com/detail/videos/customer-
stories/video/5764088895001/h-r-block:-year-round-revenue-
with-xero?autoStart=true
http://www.staygeo.com/2015/07/guide-to-e-file-income-tax-
returns.html
https://creativecommons.org/licenses/by-sa/3.0/
Case Study: Yamaha Partner 2nd
Watch and AWS Cloud Services
“Established in 1960 as Yamaha International Corporation,
Yamaha
Corporation of America (YCA) offers a full line of musical
instruments
and audio/visual products to the U.S. market.”
Business Problem:
• Yamaha’s data management based at a single data centre.
• All production, test, and development systems running in a co-
location
arrangement at another data centre.
• Yamaha had an expensive 30-month replacement cycle for its
leased
hardware.
Solution
:
• Yamaha migrated data & some management to the AWS Cloud
• Company 2nd Watch was hired to assist.
• The migration to AWS was timely.
• 2nd Watch provide ongoing management, optimisation and
planning
services.
https://aws.amazon.com/partners/apn-journal/all/yamaha-2nd-
watch/
https://aws.amazon.com/partners/apn-journal/all/yamaha-2nd-
watch/
BUS105
Business Information
Systems
Workshop Week 7
Structured Data Management
(Introductory Analytics) Life Cycle
Workshop (Excel)22
Copyright Notice
COPYRIGHT
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by
or on behalf of Kaplan Higher
Education pursuant to Part VB of the Copyright Act 1968 (the
Act). The material in
this communication may be subject to copyright under the Act.
Any further reproduction
or communication of this material by you may be the subject of
copyright protection under the Act.
Do not remove this notice
2
Lesson Learning Outcomes
1 Learn about the data analytics project
lifecycle
2 Do a hands-on exercise in excel with
reference to LO1
3 Interpret results as required
Excel Workshop Week 7
Vehicle Cost Analysis
Commons.wikipedia.org
Business Question: How much does it cost
to run a bus service?
Intechen.com
Today’s Tasks
• Please download today’s data file now
BUS105_ProximityBus_for_week_7.xlsx
• You will be doing a hands-on Microsoft Excel cost analysis
exercise in order to answer the business question:
How much does it cost to run a bus service?
• General Excel instructions will be followed by your specific
instructions.
• At the same time we will be learning about the data
analytics lifecycle and referring to it every now and then.
Data Analytics Lifecycle
Business
Understanding
Data Understanding
Data Preparation
Data Modelling
Evaluation
Deployment
Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of
machine learning for
predictive analytics, The MIT Press, Cambridge Massachusetts,
p12-15.
Data
Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of
machine learning for
predictive analytics, The MIT Press, Cambridge Massachusetts,
p12-15.
Stage 1: Business Understanding
This is stage 1 of the data analytics lifecycle.
Some questions you should answer during this stage:
• What are your objectives/aims?
e.g. Is our bus company making a profit?
• What resources do you need to start the project?
e.g. Do we need an analyst? What software do we need?
• What are your business success criteria?
e.g. How can we maintain a bus good service and keep
costs below a certain level?
• In this workshop we will work with vehicle mileage and
cost data, draw charts and perform cost calculations
using excel in-built functions.
Opening the Excel Data File
Double click on the
BUS105_ProximityB
us_for_week_7.xlsx
file icon to open the
file in Excel.
Data Understanding
• Questions to ask at this stage:
• What data have you got and is it complete?
e.g. Bus ID, cost per km, km driven...
• What was the source? e.g. Maintenance department
• What other data would be useful,
• e.g. Bus ticket prices, number of passengers per day, …..
• Do you have a description of the data
e.g. (Data dictionary or encyclopedia)
This Photo by Unknown Author is
licensed under CC BY
http://opensource.org/node/688
https://creativecommons.org/licenses/by/3.0/
Instructions
Clicking on a cell
makes it active
• Use the mouse
OR
• Use the arrow keys
to move around
How to Select a Cell
Cell is active when a heavy
border surrounds it.
© 2017 Cengage Learning. All Rights Reserved. May not be
copied, scanned, or duplicated, in whole or in part, except
for use as permitted in a license distributed with a certain
product or service or otherwise on a password-protected
website for classroom use.
Ischool.utexus.edu
Cell A1
Instructions
To enter worksheet titles, numbers or text
– Open the file in Microsoft Excel
– Click on a cell to make the cell active
– Type desired text
– Click the ENTER button to complete the entry
– Move to the next cell of interest and repeat
Additional information:
(To cut and paste, use Ctrl C and Ctrl V as in Word)
Your instructions on next page…
How to Enter Items
© 2017 Cengage Learning. All Rights Reserved. May not be
copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-
protected website for classroom use.
Now Enter Text
You will notice that some information is missing.
Your Instructions
• Click on cell B3 & enter “Cost per Km”
• Move to cell C6 and enter the missing value
14949.00
• Move to cell C7 and enter the missing value
14905.00
• Move to cell E3 and replace “M cost” with
“Mileage Cost”
Your File Should Look Like This
Are the first four columns complete?
Formulae With Simple Operators
Instructions
Using simple operators and relative cell reference
• Recall all formulae start with an = sign
• Simple operators for addition, subtraction, multiplication,
division and nth power are +, -, *, / and ^n where n is the
power, e.g. 5 squared is =5^2
• If we drag the cursor along, the cell addresses are changed
relative to position. This is called relative cell referencing.
Your Instructions
• Obtain mileage cost: Go to cell E4 in the Mileage column
of your worksheet, type “= B4*C4” ENTER
Fill Handle
Instructions :
How to copy a cell calculation to adjacent cells in a col/row
• With the cell containing the contents (e.g. E4), to fill down
the column, point to the fill handle to activate it (i.e. click
on the lower right hand corner of the active cell and a plus
sign should appear “+”)
• Hold on to the corner and drag the handle down the
column as required (i.e. to cell E12)
Your Instructions
• Copy the formula (using the fill handle) down the rest of
column E to cell E12
Mileage
Costs
Your File Should Look Like This
Using Simple Formulae Cont.
Your Instructions
• Obtain total cost: Go to cell F4 and type in “=D4 + E4”
ENTER
• Copy the formula (by dragging the cursor) down the rest
of column F to cell F12
• Obtain total cost per Km: Go to cell G4 and type in
“=F4/C4” ENTER
• Copy the formula (using the fill handle) down the rest of
column G to cell G12
Your File Should Look Like This
Finding Totals Using ‘SUM’
Instructions
To sum a column of numbers
– Click the first empty cell below the column of numbers to sum
– Click the AutoSum button on the HOME tab to display a
formula
in the formula bar and in the active cell, for example
=SUM(B4:B12)
Your Instructions
• Highlight G4 to G12 and click on the decrease decimal places
button
in the Number menu. Reduce the decimals to 2 places.
• Sum the columns using the sum formula in the “totals” row
(row 13)
Instructions
To sum a column of numbers
– Click the first empty cell below the column of numbers to sum
– Click the AutoSum button on the HOME tab to display a
formula in
the formula bar and in the active cell, for example,
=SUM(B4:B12)
Your Instructions
• Find column totals using the SUM formula in the “Totals” row
(row 13)
• Highlight G4 to G12 and click on the decrease decimal places
button
in the Number menu. Reduce the decimals to 1 place.
Finding Totals Using ‘SUM’
Your File Should Look Like This
Data Preparation
• We have carried out a small amount of data preparation.
Some of the questions you should answer during this
stage:
• Have you considered your data storage and
maintenance capacity?
e.g. Do you need new software, cloud warehousing or
just a PC?
• Do you need to transform (data wrangling) or integrate
the data in any way?
e.g. Finding total cost, reducing numbers to one decimal
Stage 4: Modelling
Questions for the modelling phase:
• What models will you use?
e.g. Descriptive, predictive analytics or AI techniques
• How will you train/test and assess the models?
e.g. You will need a training data set if you are going to
use machine learning
Let’s look at some basic summary statistics, average, max
and min.
https://www.sv-europe.com/crisp-dm-methodology//
https://www.sv-europe.com/crisp-dm-methodology/
Absolute Versus Relative
Addressing
Table 3-6 Examples of Absolute, Relative, and Mixed Cell
References
Cell Reference Type of Reference Meaning
$B$4 Absolute cell reference Both column and row references
remain the same
when you copy this cell, because the cell references
are absolute
B4 Relative cell reference Both column and row references are
relative. When
copied to another cell, both the column and row in the
cell reference are adjusted to reflect the new location
B$4 Mixed reference This cell reference is mixed. The column
reference
changes when you copy this cell to another column
because it is relative. The row reference does not
change because it is absolute
$B4 Mixed reference This cell reference is mixed. The column
reference
does not change because it is absolute. The row
reference changes when you copy this cell reference to
another row because it is relative
Absolute Versus Relative Address
Instructions
To enter a formula containing absolute cell references
– Given a selected cell, enter the formula and then press the F4
key
to change the most recently typed cell reference from a relative
cell
reference to an absolute cell reference
Your Instructions
– Go to cell A17 in your spreadsheet and type in “9”
– Calculate the average using each total divided by 9 using
absolute
referencing: Go to cell B14 and type in “=B13/$A$17”
– Apply this to cells C14 to G14 using the fill handle and adjust
the
results to 2 decimal places
Find Max and Min
Instructions
To find the maximum or minimum of a range of cells type in
=max(start cell:end cell) for maximum
=min(start cell:end cell) for minimum
Your Instructions
• Fill in the “Highest” and “Lowest” column values in row 15
and 16, using =max(B4:B12) and =min(B4:B12)
• If required, change all values so that 2 decimal places are
displayed
Your File Should Look Like This
Stage 5: Evaluation
Questions regarding the evaluation and deployment
phases:
• How will you assess the results in terms of business
success criteria?
e.g. How are these results going to help the bus company?
• Have you reviewed all the modelling so far?
e.g. What other preliminary models we can learn from?
See evaluation activity on the next page
Activity 1:Evaluation
• Answer the following questions:
1. Which bus costs the most to run per km?
2. Which bus has (lowest mileage) driven the
least number of kilometres?
3. What is the lowest maintenance cost?
Stage 5: Deployment
Questions regarding the deployment phases:
• Next steps? Do you need to gather more data, carry out
another data mining project, or start deployment?
e.g. let’s try filtering and sorting values of interest
(see next page)
• How will you implement your findings?
e.g. find a way to reduce the cost of bus 701
Filtering
Instructions
Filtering
• The editing menu has sort and filter commands
• To filter items based on a particular column: click on the
column to be filtered
• Move your mouse to the editing menu and select filter a small
box with an arrow will appear at the top of the column
• Clicking on the arrow reveal the items in the list
• Unticking individual boxes hides (filters out those items) and
the filter box changes shape
• This command is good for removing BLANKS in data sets
• To display (unfilter) the list click on “select all” in the list of
filter
boxes, and your original data should be displayed
Filtering
We want to filter out the costs per Km less than
1.80
Your Instructions
1. Click on the top of column B of your spreadsheet
2. Take the cursor to the editing menu and select filter
3. Click on the filter box in column B to reveal the details
of data in column B
4. Untick the boxes with values lower than 1.80 to hide
them
Notice that the row
with the minimum
is now hidden too
Filtered column
Your File Should Look Like This
We want to sort the mileage costs while keeping the other
row information consistent with those costs
Your Instructions
1. First unfilter column B by ticking the “select all” box in the
filter options
2. Copy just the values of data block from cell A3 to G12 to
Sheet 1 by highlighting the data and using control C
3. Click on cell A1 in sheet 1, right click on your mouse and
select paste special, click on the values option and OK
4. Select column E, go to the sort menu and select “sort
smallest to largest”
The “Expand selection” box will appear, make sure the
expand selection option is checked and then press SORT
Sorting
Sorting
Your Worksheet Sheet 1 should look like this
Total Cost per bus as a Percentage of
the Entire Total Cost
B701
15%
B702
12%
B703
11%
B704
11%
B705
7%
B706
10%
B707
11%
B708
11%
B709
12%
TOTAL COST PER BUS AS A PERCENTAGE OF SUM
TOTAL COST
An alternative representation of total costs
Activity 2: Interpretation
Answer these questions:
1. Which bus has the greatest mileage cost?
2. What is the maintenance cost of the bus of
interest in question 1?
3. Is it easier to interpret the table or pie chart?
Why?
BUS105
Business Information
Systems
Lesson week 8
Semi-structured and unstructured data
management
Lesson Learning Outcomes
1 Define semi-structured and unstructured data
2 Distinguish between the various NoSQL and
NewSQL databases
3 Learn about various software packages for
the management of semi-structured and
unstructured data
4 Evaluate case studies
5 Final discussion with your teacher of
individual report
Dark analytics: Analyzing
unstructured data
Did you know that 95% of data in the world is unstructured?
Watch the video on Dark Analytics
https://www.youtube.com/watch?v=X4f-GCGraXI
What sorts of data is really difficult to analyse?
https://www.youtube.com/watch?v=X4f-GCGraXI
Glossary 1
LO1
Recall that
• Semi-structured data has some structure
- e.g. CSV files with comma separated data. XML &
JavaScript Object Notation, JSON, documents used to
exchange data to/from a web server.
**** some analysts do consider .csv files as structured data
• Unstructured data has no predefined data model not
organised, may have multiple types of data
- e.g. data from thermostats, sensors, home electronic
devices, cars, images and sounds & pdf files.
EMC Education Services (Eds.) 2015, Data Science and Big
Data Analytics: Discovering, Analyzing, Visualizing and
Presenting Data, John Wiley &
Sons, Indianapolis, US.
https://www.google.com/search?q=parsing+definition&ie=&oe=
https://en.wikipedia.org/wiki/JSON
Glossary 3
LO1
• Quasi-structured data textual data which has various
formats and takes effort to handle and analyse
– e.g. web clickstream data
• Unstructured data has no predefined data model not
organised, may have multiple types of data
- e.g. data from thermostats, sensors, home electronic
devices, cars, images and sounds & pdf files.
EMC Education Services (Eds.) 2015, Data Science and Big
Data Analytics: Discovering, Analyzing,
Visualizing and Presenting Data, John Wiley & Sons,
Indianapolis, US.
https://commons.wikimedia.org/wiki/Ne
odythemis_hildebrandti
Why we need non-relational
databases?
• Big data has driven the need for
• NoSQL databases
– For unstructured data
• NewSQL databases
– Bridging the gap between relational and NoSQL database
design
• Note: Querying language/method depends on the
database used
This Photo by Unknown Author is licensed
under CC BY-NC
http://www.ksi.mff.cuni.cz/
https://creativecommons.org/licenses/by-nc/3.0/
Recall: NoSQL Databases
NoSQL (Not only SQL), i.e.Non-relational databases
Are used to manage unstructured & semi• -structured data
Sometimes called • “Cloud” databases
• Usually open source
Work on a distributed (parallel) data approach•
General categories of non• -relational databases (DBs):
– Key-value DBs, e.g. shopping cart, sensor data
– Document DBs, e.g. tweets, customer data, blog posts
– column-oriented DBs, e.g. time series, banking
– Graph DBs, e.g. networks, social connections
Coronel, C, and Morris, S 2019, Database Systems: Design,
Implementation, &
Management, 13th Edn.,Cengage, Boston, USA.
Activity 1:Match database type and
application
Key-value
DBs
Document
DBs
Column-
oriented
DBs
Graph
DBs
Shopping cart
Tweets
Networks
Time series
Sensor data
Banking
Blog posts
Social connections
Example of Key-Value Database
For example, student names and ages. The name is
used as the key.
Software
Windows Azure•
Riak•
Redis•
Dynamo•
https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction-
of-nosql-
database/
https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction-
of-nosql-database/
Example: Document Database
• For example, student names, ages & salaries
• Each document has a unique key for searching
• Documents appear as JavaScript Object Notation (JSON)
files (semi-structured)
Software
• MongoDB
• RavenDb
• CouchDB
• OrientDB
h
tt
p
s
:/
/w
w
w
.c
-s
h
a
rp
c
o
rn
e
r.
c
o
m
/U
p
lo
a
d
F
il
e
/f
0
b
2
e
d
/i
n
tr
o
d
u
c
ti
o
n
-o
f-
n
o
s
q
l-
d
a
ta
b
a
s
e
/
https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction-
of-nosql-database/
Example: JSON code
JSON format code examples that could be used to exchange data
to or
from a web server:
{“name”: “John”, “age”:30, “Car”: “Ford” }
{“StreetNum”: 5, “streetName”:”King William”, “Lanes”: 4}
KEY VALUE colon (:) curly brace
1. JSON objects are surrounded by curly braces {},
2. They are written in key & value pairs.
3. Keys must be strings, and values must be a valid JSON data
type
(string, i.e. text), number, object, array, boolean or null).
4. Keys and values are separated by a colon.
5. Each key/value pair is separated by a comma.
Javascript: JSON and Ajax, 1998 -2014 O’Reilly Media, Inc.
available at
archive.oreilly.com/oreillyschool/courses/javascript2/Javascript
%20JSON%20and%20Ajax%20v2.pdf
This work is licensed under a Creative Commons Attribution-
ShareAlike 3.0 Unported License.
https://en.wikipedia.org/wiki/JSON
Activity 2: JSON code
• Why are these incorrectly coded?
1. (“name”: “John”, “age”:30, “Car”: “Ford” )
2. {name: “John”, age:30, Car: “Ford” }
3. {“name”: “age”:30, “Car”: “Ford” }
4. {“name”: “John”, “age”:30, [Car]: [Ford] }
5. {“name”: “John” “age”:30 “Car”: “Ford” }
More about MongoDB
• A document database
• Documents do not have to conform to the
same structure (schema-less)
• Documents with similar types are stored in
collections, related collections are stored in a
DB
• The documents appear as JSON files to users
Coronel, C, and Morris, S 2019, Database Systems: Design,
Implementation,
& Management, 13th Edn.,Cengage, Boston, USA.
Example: Column-Oriented Database
• Same example in a row store (relational) and column
(non-relational). Software, Cassandra and HBase
Relational Table
Column-centric storage
Block 1 | 125670,145679,234466,785940,785840
Block 2 | Ma,Jimmy,Peter,Sundar,Jiping
Block 3 | 130, 128 144, 132, 110
Block 4 | 85,78,88,82,70
Activity 3: Column-Oriented Database
• Convert the subset of data from the week 7 excel file
(shown below) into column-centric format
Relational Table
Column-
centric
storage
Block 1 |
Block 2 |
Block 3 |
Block 4 |
Case study: Fraud detection using a
Graph Database
• Neo4j video on Fraud detection
• Watch the video and learn about graph database design
https://www.youtube.com/watch?v=ujimD6MP87I
https://www.youtube.com/watch?v=ujimD6MP87I
Aggregate awareness
• Aggregate awareness means that the data is
grouped (or “aggregated”) around a central topic
• For example, data collected in connection with an
individual blog post, including
– Title, content, date posted
– Username, screen name
– Comments made on the post, etc
• Key value, document and column DBs are all
aggregate aware
Coronel, C, and Morris, S 2019, Database Systems: Design,
Implementation,
& Management, 13th Edn.,Cengage, Boston, USA.
NewSQL Databases
• Cloud-based to handle large amounts of data
• E.g. ClustrixDB, NuoDB
• Use SQL for queries
• Use massively parallel query processing (MPP)
, i.e. data across multiple servers which
process the data locally
• Key-value and column-oriented data stores
Case Study: Hit Labs
ClustrixDB customer success story
• Application: Hit Labs created the Bubble Group Messenger
App (for group messaging and group chat)
• It is free on iOS and Android devices
• Originally built on Amazon's Aurora
• Problem: Hit Labs wanted a database to support their rapid
user growth
•

More Related Content

Similar to BUS105Business Information SystemsWorkshop Week 3.docx

Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Database Systems
Database SystemsDatabase Systems
Database SystemsUsman Tariq
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdfTom Tan
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Webinar: Designing Storage and Apps to Enable Data Monetization
Webinar: Designing Storage and Apps to Enable Data MonetizationWebinar: Designing Storage and Apps to Enable Data Monetization
Webinar: Designing Storage and Apps to Enable Data MonetizationStorage Switzerland
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Introduction to Databases and Transactions
Introduction to Databases and TransactionsIntroduction to Databases and Transactions
Introduction to Databases and Transactionsنبيله نواز
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptxNATASHABANO
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfNikitaKumari71
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...SakkaravarthiS1
 

Similar to BUS105Business Information SystemsWorkshop Week 3.docx (20)

Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Database Systems
Database SystemsDatabase Systems
Database Systems
 
Unit 5
Unit 5 Unit 5
Unit 5
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Webinar: Designing Storage and Apps to Enable Data Monetization
Webinar: Designing Storage and Apps to Enable Data MonetizationWebinar: Designing Storage and Apps to Enable Data Monetization
Webinar: Designing Storage and Apps to Enable Data Monetization
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Abstract
AbstractAbstract
Abstract
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Introduction to Databases and Transactions
Introduction to Databases and TransactionsIntroduction to Databases and Transactions
Introduction to Databases and Transactions
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdf
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
 

More from jasoninnes20

1-2paragraphsapa formatWelcome to Module 6. Divers.docx
1-2paragraphsapa formatWelcome to Module 6. Divers.docx1-2paragraphsapa formatWelcome to Module 6. Divers.docx
1-2paragraphsapa formatWelcome to Module 6. Divers.docxjasoninnes20
 
1-Post a two-paragraph summary of the lecture;  2- Review the li.docx
1-Post a two-paragraph summary of the lecture;  2- Review the li.docx1-Post a two-paragraph summary of the lecture;  2- Review the li.docx
1-Post a two-paragraph summary of the lecture;  2- Review the li.docxjasoninnes20
 
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docxjasoninnes20
 
1-page (max) proposal including a Title, Executive Summary, Outline,.docx
1-page (max) proposal including a Title, Executive Summary, Outline,.docx1-page (max) proposal including a Title, Executive Summary, Outline,.docx
1-page (max) proposal including a Title, Executive Summary, Outline,.docxjasoninnes20
 
1-Identify the benefits of sharing your action research with oth.docx
1-Identify the benefits of sharing your action research with oth.docx1-Identify the benefits of sharing your action research with oth.docx
1-Identify the benefits of sharing your action research with oth.docxjasoninnes20
 
1-page APA 7 the edition No referenceDescription of Personal a.docx
1-page APA 7 the edition  No referenceDescription of Personal a.docx1-page APA 7 the edition  No referenceDescription of Personal a.docx
1-page APA 7 the edition No referenceDescription of Personal a.docxjasoninnes20
 
1-Pretend that you are a new teacher.  You see that one of your st.docx
1-Pretend that you are a new teacher.  You see that one of your st.docx1-Pretend that you are a new teacher.  You see that one of your st.docx
1-Pretend that you are a new teacher.  You see that one of your st.docxjasoninnes20
 
1- What is the difference between a multi-valued attribute and a.docx
1- What is the difference between a multi-valued attribute and a.docx1- What is the difference between a multi-valued attribute and a.docx
1- What is the difference between a multi-valued attribute and a.docxjasoninnes20
 
1- What is a Relational Algebra What are the operators. Explain.docx
1- What is a Relational Algebra What are the operators. Explain.docx1- What is a Relational Algebra What are the operators. Explain.docx
1- What is a Relational Algebra What are the operators. Explain.docxjasoninnes20
 
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docxjasoninnes20
 
1- reply to both below, no more than 75 words per each.  PSY 771.docx
1- reply to both below, no more than 75 words per each.  PSY 771.docx1- reply to both below, no more than 75 words per each.  PSY 771.docx
1- reply to both below, no more than 75 words per each.  PSY 771.docxjasoninnes20
 
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docxjasoninnes20
 
1-  I can totally see where there would be tension between.docx
1-  I can totally see where there would be tension between.docx1-  I can totally see where there would be tension between.docx
1-  I can totally see where there would be tension between.docxjasoninnes20
 
1- One of the most difficult challenges leaders face is to integrate.docx
1- One of the most difficult challenges leaders face is to integrate.docx1- One of the most difficult challenges leaders face is to integrate.docx
1- One of the most difficult challenges leaders face is to integrate.docxjasoninnes20
 
1- Design one assignment of the Word Find (education word) and the o.docx
1- Design one assignment of the Word Find (education word) and the o.docx1- Design one assignment of the Word Find (education word) and the o.docx
1- Design one assignment of the Word Find (education word) and the o.docxjasoninnes20
 
1- This chapter suggests that emotional intelligence is an interpers.docx
1- This chapter suggests that emotional intelligence is an interpers.docx1- This chapter suggests that emotional intelligence is an interpers.docx
1- This chapter suggests that emotional intelligence is an interpers.docxjasoninnes20
 
1-2 pages APA format1. overall purpose of site 2. resources .docx
1-2 pages APA format1. overall purpose of site 2. resources .docx1-2 pages APA format1. overall purpose of site 2. resources .docx
1-2 pages APA format1. overall purpose of site 2. resources .docxjasoninnes20
 
1-Define Energy.2- What is Potential energy3- What is K.docx
1-Define Energy.2- What is Potential energy3- What is K.docx1-Define Energy.2- What is Potential energy3- What is K.docx
1-Define Energy.2- What is Potential energy3- What is K.docxjasoninnes20
 
1- Find one quote from chapter 7-9. Explain why this quote stood.docx
1- Find one quote from chapter 7-9. Explain why this quote stood.docx1- Find one quote from chapter 7-9. Explain why this quote stood.docx
1- Find one quote from chapter 7-9. Explain why this quote stood.docxjasoninnes20
 
1-Confucianism2-ShintoChoose one of the religious system.docx
1-Confucianism2-ShintoChoose one of the religious system.docx1-Confucianism2-ShintoChoose one of the religious system.docx
1-Confucianism2-ShintoChoose one of the religious system.docxjasoninnes20
 

More from jasoninnes20 (20)

1-2paragraphsapa formatWelcome to Module 6. Divers.docx
1-2paragraphsapa formatWelcome to Module 6. Divers.docx1-2paragraphsapa formatWelcome to Module 6. Divers.docx
1-2paragraphsapa formatWelcome to Module 6. Divers.docx
 
1-Post a two-paragraph summary of the lecture;  2- Review the li.docx
1-Post a two-paragraph summary of the lecture;  2- Review the li.docx1-Post a two-paragraph summary of the lecture;  2- Review the li.docx
1-Post a two-paragraph summary of the lecture;  2- Review the li.docx
 
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx
1-What are the pros and cons of parole. Discuss!2-Discuss ways t.docx
 
1-page (max) proposal including a Title, Executive Summary, Outline,.docx
1-page (max) proposal including a Title, Executive Summary, Outline,.docx1-page (max) proposal including a Title, Executive Summary, Outline,.docx
1-page (max) proposal including a Title, Executive Summary, Outline,.docx
 
1-Identify the benefits of sharing your action research with oth.docx
1-Identify the benefits of sharing your action research with oth.docx1-Identify the benefits of sharing your action research with oth.docx
1-Identify the benefits of sharing your action research with oth.docx
 
1-page APA 7 the edition No referenceDescription of Personal a.docx
1-page APA 7 the edition  No referenceDescription of Personal a.docx1-page APA 7 the edition  No referenceDescription of Personal a.docx
1-page APA 7 the edition No referenceDescription of Personal a.docx
 
1-Pretend that you are a new teacher.  You see that one of your st.docx
1-Pretend that you are a new teacher.  You see that one of your st.docx1-Pretend that you are a new teacher.  You see that one of your st.docx
1-Pretend that you are a new teacher.  You see that one of your st.docx
 
1- What is the difference between a multi-valued attribute and a.docx
1- What is the difference between a multi-valued attribute and a.docx1- What is the difference between a multi-valued attribute and a.docx
1- What is the difference between a multi-valued attribute and a.docx
 
1- What is a Relational Algebra What are the operators. Explain.docx
1- What is a Relational Algebra What are the operators. Explain.docx1- What is a Relational Algebra What are the operators. Explain.docx
1- What is a Relational Algebra What are the operators. Explain.docx
 
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx
1- Watch the movie Don Quixote, which is an adaptation of Cerv.docx
 
1- reply to both below, no more than 75 words per each.  PSY 771.docx
1- reply to both below, no more than 75 words per each.  PSY 771.docx1- reply to both below, no more than 75 words per each.  PSY 771.docx
1- reply to both below, no more than 75 words per each.  PSY 771.docx
 
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx
1- Pathogenesis 2- Organs affected in the body 3- Chain of i.docx
 
1-  I can totally see where there would be tension between.docx
1-  I can totally see where there would be tension between.docx1-  I can totally see where there would be tension between.docx
1-  I can totally see where there would be tension between.docx
 
1- One of the most difficult challenges leaders face is to integrate.docx
1- One of the most difficult challenges leaders face is to integrate.docx1- One of the most difficult challenges leaders face is to integrate.docx
1- One of the most difficult challenges leaders face is to integrate.docx
 
1- Design one assignment of the Word Find (education word) and the o.docx
1- Design one assignment of the Word Find (education word) and the o.docx1- Design one assignment of the Word Find (education word) and the o.docx
1- Design one assignment of the Word Find (education word) and the o.docx
 
1- This chapter suggests that emotional intelligence is an interpers.docx
1- This chapter suggests that emotional intelligence is an interpers.docx1- This chapter suggests that emotional intelligence is an interpers.docx
1- This chapter suggests that emotional intelligence is an interpers.docx
 
1-2 pages APA format1. overall purpose of site 2. resources .docx
1-2 pages APA format1. overall purpose of site 2. resources .docx1-2 pages APA format1. overall purpose of site 2. resources .docx
1-2 pages APA format1. overall purpose of site 2. resources .docx
 
1-Define Energy.2- What is Potential energy3- What is K.docx
1-Define Energy.2- What is Potential energy3- What is K.docx1-Define Energy.2- What is Potential energy3- What is K.docx
1-Define Energy.2- What is Potential energy3- What is K.docx
 
1- Find one quote from chapter 7-9. Explain why this quote stood.docx
1- Find one quote from chapter 7-9. Explain why this quote stood.docx1- Find one quote from chapter 7-9. Explain why this quote stood.docx
1- Find one quote from chapter 7-9. Explain why this quote stood.docx
 
1-Confucianism2-ShintoChoose one of the religious system.docx
1-Confucianism2-ShintoChoose one of the religious system.docx1-Confucianism2-ShintoChoose one of the religious system.docx
1-Confucianism2-ShintoChoose one of the religious system.docx
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 

Recently uploaded (20)

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

BUS105Business Information SystemsWorkshop Week 3.docx

  • 1. BUS105 Business Information Systems Workshop Week 3 Small and big Data Collection, Storage and Management in Relation to Information Systems Copyright Notice COPYRIGHT COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Kaplan Higher Education pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction
  • 2. or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice 2 Lesson Learning Outcomes 1 Review different types of data 2 Contrast small and big data collection 3 Learn about data storage and management 4 Examine business case studies in relation to the type of data requirements for particular information systems Splunk: Slicing Data for Domino’s Pizza • Watch the video on how Splunk is helping to improve Domino’s business functions https://www.youtube.com/watch?v=LXMjN6kVmUY Q: What was the big event
  • 3. that occurred in the US that required many pizza orders? https://www.youtube.com/watch?v=LXMjN6kVmUY • Raw data (primary data) – Numbers, words, symbols collected from a source – Not cleaned or processed – may have errors or outliers • Metadata – Data that provides information about other data – “Metadata explains the origin, purpose, time, geographic location, creator, access, and terms of use of the data.” https://data.library.arizona.edu/data-management-tips/data- documentation-and-metadata Glossary 1 LO1 https://data.library.arizona.edu/data-management-tips/data- documentation-and-metadata • Metadata from a pdf file Metadata Example
  • 4. Glossary 2 LO1 • Structured data is formatted for use, has a well-defined data structure, generally stored in rows and columns - e.g. age (in years), first name (text), address (text), income ($), etc. We will learn more about this in the relational database section of the slides. • Semi-structured data has some structure - e.g. CSV files with comma separated data. XML and JavaScript Object Notation, JSON, documents used to exchange data to/from a web server • Parse means to analyse (a string or text) into logical syntactic components. EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley & Sons, Indianapolis, US. https://www.google.com/search?q=parsing+definition&ie=&oe= https://en.wikipedia.org/wiki/JSON https://www.google.com/search?q=parsing+definition&ie=&oe Glossary 3
  • 5. LO1 • Quasi-structured data textual data which has various formats and takes effort to handle and analyse – e.g. web clickstream data • Unstructured data has no predefined data model, not organised, may have multiple types of data - e.g. data from thermostats, sensors, home electronic devices, cars, images and sounds & pdf files. EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley & Sons, Indianapolis, US. https://commons.wikimedia.org/wiki/Neod ythemis_hildebrandti https://commons.wikimedia.org/wiki/Neodythemis_hildebrandti Numerical vs Categorical Data LO1 Data Numerical (quantitative)
  • 6. Discrete: takes numerical values from counting Continuous: takes numerical values from measurements Categorical (qualitative) Nominal : an identifier or label and has no numerical meaning Ordinal: categories that can be ranked (ordered) arbitrarily Examples of Numerical and Categorical Data Data Numerical (quantitative) Discrete: number of chairs in this room Continuous: height Categorical (qualitative) Nominal: colours, i.e. blue, green, yellow.....
  • 7. Ordinal: risk, e.g. 1. High risk, 2. Medium risk 3. Low risk Activity 1: Numerical and Categorical Data • Form groups and find more examples of the data types Data Numerical (quantitative) Discrete: Continuous: Categorical (qualitative) Nominal: Ordinal: • Suppose that you have been employed by bicycle hire company Citibike to analyse bike trips made by customers
  • 8. in 2018. Some of the questions you may have are: • Where do the customers ride most often? • How far do the customers ride? • How old, on average, are the customers? https://www.citibikenyc.com/ Q: What sort of data would you collect and how much? Who Wants to Ride Around New York City? https://www.citibikenyc.com/ Who Wants to Ride Around New York City? This is structured data. Q: How do you think this customer data is collected? • We obtained a data set of 12,677 trips taken in January 2018.
  • 9. • Variables include • Trip Duration (seconds) • Start Time and Date • Stop Time and Date • Start Station Name • End Station Name • Station ID • Station Lat/Long • Bike ID • User Type (Customer = 24-hour pass or 3-day pass user; Subscriber = Annual Member) • Gender (Zero=unknown; 1=male; 2=female) • Year of Birth https://data.world/citibikenyc/citibike-tripdata-january-2018 Q. What type of variables are these? Who Wants to Ride Around New York City? https://data.world/citibikenyc/citibike-tripdata-january-2018
  • 10. Activity 2: Contrast Small and Big Data LO2 • Watch the video and list four of the ten ways in which small and big data differ • Report back to class https://www.youtube.com/watch?v=nh-FrpMqlIs https://www.youtube.com/watch?v=nh-FrpMqlIs Small Data Summary LO2 1. Goal: often for a very specific purpose 2. Location: usually stored in one place 3. Structure: more likely to be structured data 4. Data preparation: often handled by a single person 5. Longevity: may only be kept for 7 years 6. Measurements: usually measurements taken by a smaller group or one person/machine and are consistent 7. Reproducibility: easier to reproduce
  • 11. 8. Stakes (cost): less expensive 9. Introspection: easier to interpret and data points clearer 10. Analysis: often easier to organise and analyse Video on content from Jules Berman’s book called Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information https://www.youtube.com/watch?v=nh- FrpMqlIs https://www.youtube.com/watch?v=nh-FrpMqlIs Big Data Summary LO2 1. Goals: one may not know how they are going to use all of their big data 2. Location: in multiple places (servers) 3. Structure: all types (structured, semi, quasi and unstructured) 4. Data preparation: by several persons 5. Longevity: may be kept for much longer and possibly used across different projects, or linked to other data later 6. Measurements: by different persons/machines with different protocols
  • 12. 7. Reproducibility: more difficult to recover data if something goes wrong. 8. Stakes (cost): can be expensive 9. Introspection: you may not be able to identify data type or use 10. Analysis: more complex, e.g. requires extraction, transformation, etc. How Business Collects Customer Big Data Internally collected as: • Sales data (transaction history, customer interaction) • Customer feedback (e.g. Facebook) Externally collected by: • Directly asking • Indirect tracking (emails, apps and third-party trackers, • Websites, cookies and web beacons • Adding other data sources to their own by – purchasing third party data (e.g. from data companies Acxiom and Oracle)
  • 13. https://www.itchronicles.com/big-data/how-do-big-companies- collect-customer-data/ https://marketing.acxiom.com/rs/982-LRE-196/images/Acxiom UK_Data_Source_Information-Privacy_LATEST.pdf https://www.oracle.com/index.html https://www.itchronicles.com/big-data/how-do-big-companies- collect-customer-data/ Activity 3: Quick Quiz LO2 1. Big data is usually collected for one specific purpose. a. True b. False 2. Small data is usually stored in one place (on one computer or server). a. True b. False 3. The Kaplan Information systems course code BUS105 is a: a. Continuous numerical variable b. Ordinal variable c. Nominal variable d. Discrete numerical variable
  • 14. Storage of Data LO3 • Data Lake – Repository for large amounts of raw data from multiple sources and in many formats, some of which may not be useful • Data warehouse – A repository of data from various sources, partially re- organised, and used to support decision makers in the organisation – Takes data from data lake and transforms it • Data mart – A low-cost, scaled-down version of a data warehouse designed for the end-user needs in a strategic business unit (SBU) or a department • Database – Organised collection of structured data (relational) or specific Semi-, quasi and unstructured data (non-relational) Big Data Storage and
  • 15. Management Options Top 10 Big Data Storage Companies https://selecthub.com/big-data-storage-software/ We will learn more about semi and unstructured data management in week 8. https://selecthub.com/big-data-storage-software/ Relational Database Management Systems • Database management system (DBMS) – A set of tools to add, delete, access, modify, and analyse stored data Relational databases • Data represented as two-dimensional tables with columns and rows Example: Microsoft Excel Software for storage and finding data: MySQL, Microsoft Access, Google Spanner, MemSQL http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/
  • 16. http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/ Non-Relational Database Management Systems Non-relational databases • For big data and real-time web data • Usually open source and work on a distributed (parallel) data approach General categories of non-relational databases: Key-value stores for shopping cart, sensor data Document stores for tweets, customer data, blog posts Wide-column stores for time series, banking Graph stores for networks, social connections http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/ https://stackoverflow.com/questions/35281066/neo4j-is-it- possible-to-visualise-a-simple-overview-of-my-database http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/ https://stackoverflow.com/questions/35281066/neo4j-is-it- possible-to-visualise-a-simple-overview-of-my-database
  • 17. Non-relational databases NoSQL databases: • Store data in a non-tabular for, e.g. MongoDB (JSON), Neo4j, HBASE XML databases: • Have an XML format, e.g. Oracle Berkeley DB XML, eXist-db, BaseX http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/ https://stackoverflow.com/questions/35281066/neo4j-is-it- possible-to-visualise-a-simple-overview-of-my-database Non-Relational Database Management Systems Cont. http://bigdata-madesimple.com/relational-vs-non-relational- databases-part-1/ https://stackoverflow.com/questions/35281066/neo4j-is-it- possible-to-visualise-a-simple-overview-of-my-database Query Languages • Query languages request information from databases. • Querying language and method used depends on the
  • 18. database used. • The oldest query language is structured query language (SQL) for relational databases. – SQL does complicated searches using simple key words, e.g. • SELECT (specifies a desired attribute) • FROM (specifies the table to be used) • WHERE (specifies conditions to apply in the query) Other types: UnQL for noSQL databases • Xquery, XQL for XML databases Activity 4: Review Quiz Q1: SQL stands for: a. Sequence query language b. Structured query language c. Semi query language d. Social query language Q2: Would you use a data mart across a large organisation or just in a
  • 19. department? Q3: MongoDB is a a. Relational database b. Table c. XML database d. NoSQL database using JSON Data Governance • Data governance: – The policies and processes for managing data and information across an entire organisation for a specified time. • Master data management – How and where data is managed and maintained for the entire organisation • Roles and responsibilities – Staff in charge of making policies and managing data Example (see next slide) • Cancer Institute NSW data governance policy
  • 20. Master data management Roles and responsibilities http://databaseanswers.org/downloads/Data_Governance_by_Ex ample.pdf Data governance https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c- af45-ee4716a4066b/CINSW-data-governance-policy.PDF http://databaseanswers.org/downloads/Data_Governance_by_Ex ample.pdf Case Study: Cancer Institute NSW Data Governance • Extract from page 6 of the policy document https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c- af45-ee4716a4066b/CINSW-data-governance-policy.PDF https://www.cancer.nsw.gov.au/getmedia/b6a63978-f588-493c- af45-ee4716a4066b/CINSW-data-governance-policy.PDF Data Management Summary LO3
  • 21. Data management is how you: – Organise, structure, and maintain the data – Store, back up, and preserve data – Prepare material for analysis, or to share with others This Photo by Unknown Author is licensed under CC BY Management is part of governance (hence the overlap) http://archive.edrm.net/resources/edrm-white-paper-series/igrm- garp https://creativecommons.org/licenses/by/3.0/ Activity 5: Data Governance • Form groups, watch the video on data governance and answer the questions below. https://www.youtube.com/watch?v=t4IOS5csv40 Q1: Definite data governance. Why do we need it? Q2: What keywords came up in the video in relation to data governance? Q3: What are the three key components of data
  • 22. governance? Can you explain them in your own words? https://www.youtube.com/watch?v=t4IOS5csv40 Data Documentation • Data documentation is important for transparency. • Methods include data dictionaries, schema, metadata A data dictionary is a reference (document) of the variables in a database. – Defines the format necessary to enter the data into the database, i.e. ranges, codes, decimal places – Creates standard definitions for all attributes – Provides organisational data resource inventory for effective data management Creating a Data Dictionary Watch the video on creating a data dictionary. https://www.youtube.com/watch?v=AeVJy-ow2b0 Do you understand these basic elements now? Field name Field size
  • 23. Data type Data format Description Example (optional) See activity on next slide https://www.youtube.com/watch?v=AeVJy-ow2b0 Activity 6: Create a Simple Data Dictionary for the Citibike Data • Form a group • Download the file ‘JC-201801-citibike-tripdata.xlsx’ • As a group, construct a simple data dictionary for at least four variables in the Citibike data • Report back to class Case Study: H&R Block Partner With Xero LO3 • The video shows how H&R Block has adopted
  • 24. Xero to customise service, given customer tax data • Click on link: Xero • Xero partners dominate nominations for the Australian Accounting Awards 2019 This Photo by Unknown Author is licensed under CC BY-SA https://tv.xero.com/detail/videos/customer- stories/video/5764088895001/h-r-block:-year-round-revenue- with-xero?autoStart=true http://www.staygeo.com/2015/07/guide-to-e-file-income-tax- returns.html https://creativecommons.org/licenses/by-sa/3.0/ Case Study: Yamaha Partner 2nd Watch and AWS Cloud Services “Established in 1960 as Yamaha International Corporation, Yamaha Corporation of America (YCA) offers a full line of musical instruments and audio/visual products to the U.S. market.” Business Problem: • Yamaha’s data management based at a single data centre. • All production, test, and development systems running in a co-
  • 25. location arrangement at another data centre. • Yamaha had an expensive 30-month replacement cycle for its leased hardware. Solution : • Yamaha migrated data & some management to the AWS Cloud • Company 2nd Watch was hired to assist. • The migration to AWS was timely. • 2nd Watch provide ongoing management, optimisation and planning services. https://aws.amazon.com/partners/apn-journal/all/yamaha-2nd-
  • 26. watch/ https://aws.amazon.com/partners/apn-journal/all/yamaha-2nd- watch/ BUS105 Business Information Systems Workshop Week 7 Structured Data Management (Introductory Analytics) Life Cycle Workshop (Excel)22 Copyright Notice
  • 27. COPYRIGHT COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Kaplan Higher Education pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice 2 Lesson Learning Outcomes
  • 28. 1 Learn about the data analytics project lifecycle 2 Do a hands-on exercise in excel with reference to LO1 3 Interpret results as required Excel Workshop Week 7 Vehicle Cost Analysis Commons.wikipedia.org Business Question: How much does it cost to run a bus service? Intechen.com
  • 29. Today’s Tasks • Please download today’s data file now BUS105_ProximityBus_for_week_7.xlsx • You will be doing a hands-on Microsoft Excel cost analysis exercise in order to answer the business question: How much does it cost to run a bus service? • General Excel instructions will be followed by your specific instructions. • At the same time we will be learning about the data analytics lifecycle and referring to it every now and then. Data Analytics Lifecycle
  • 30. Business Understanding Data Understanding Data Preparation Data Modelling Evaluation Deployment Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of machine learning for predictive analytics, The MIT Press, Cambridge Massachusetts, p12-15. Data Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of machine learning for
  • 31. predictive analytics, The MIT Press, Cambridge Massachusetts, p12-15. Stage 1: Business Understanding This is stage 1 of the data analytics lifecycle. Some questions you should answer during this stage: • What are your objectives/aims? e.g. Is our bus company making a profit? • What resources do you need to start the project? e.g. Do we need an analyst? What software do we need? • What are your business success criteria? e.g. How can we maintain a bus good service and keep costs below a certain level?
  • 32. • In this workshop we will work with vehicle mileage and cost data, draw charts and perform cost calculations using excel in-built functions. Opening the Excel Data File Double click on the BUS105_ProximityB us_for_week_7.xlsx file icon to open the file in Excel. Data Understanding • Questions to ask at this stage:
  • 33. • What data have you got and is it complete? e.g. Bus ID, cost per km, km driven... • What was the source? e.g. Maintenance department • What other data would be useful, • e.g. Bus ticket prices, number of passengers per day, ….. • Do you have a description of the data e.g. (Data dictionary or encyclopedia) This Photo by Unknown Author is licensed under CC BY http://opensource.org/node/688 https://creativecommons.org/licenses/by/3.0/ Instructions Clicking on a cell
  • 34. makes it active • Use the mouse OR • Use the arrow keys to move around How to Select a Cell Cell is active when a heavy border surrounds it. © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 35. Ischool.utexus.edu Cell A1 Instructions To enter worksheet titles, numbers or text – Open the file in Microsoft Excel – Click on a cell to make the cell active – Type desired text – Click the ENTER button to complete the entry – Move to the next cell of interest and repeat Additional information: (To cut and paste, use Ctrl C and Ctrl V as in Word) Your instructions on next page…
  • 36. How to Enter Items © 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password- protected website for classroom use. Now Enter Text You will notice that some information is missing. Your Instructions • Click on cell B3 & enter “Cost per Km” • Move to cell C6 and enter the missing value 14949.00 • Move to cell C7 and enter the missing value
  • 37. 14905.00 • Move to cell E3 and replace “M cost” with “Mileage Cost” Your File Should Look Like This Are the first four columns complete? Formulae With Simple Operators Instructions Using simple operators and relative cell reference • Recall all formulae start with an = sign • Simple operators for addition, subtraction, multiplication,
  • 38. division and nth power are +, -, *, / and ^n where n is the power, e.g. 5 squared is =5^2 • If we drag the cursor along, the cell addresses are changed relative to position. This is called relative cell referencing. Your Instructions • Obtain mileage cost: Go to cell E4 in the Mileage column of your worksheet, type “= B4*C4” ENTER Fill Handle Instructions : How to copy a cell calculation to adjacent cells in a col/row • With the cell containing the contents (e.g. E4), to fill down the column, point to the fill handle to activate it (i.e. click
  • 39. on the lower right hand corner of the active cell and a plus sign should appear “+”) • Hold on to the corner and drag the handle down the column as required (i.e. to cell E12) Your Instructions • Copy the formula (using the fill handle) down the rest of column E to cell E12 Mileage Costs Your File Should Look Like This
  • 40. Using Simple Formulae Cont. Your Instructions • Obtain total cost: Go to cell F4 and type in “=D4 + E4” ENTER • Copy the formula (by dragging the cursor) down the rest of column F to cell F12 • Obtain total cost per Km: Go to cell G4 and type in “=F4/C4” ENTER • Copy the formula (using the fill handle) down the rest of column G to cell G12 Your File Should Look Like This
  • 41. Finding Totals Using ‘SUM’ Instructions To sum a column of numbers – Click the first empty cell below the column of numbers to sum – Click the AutoSum button on the HOME tab to display a formula in the formula bar and in the active cell, for example =SUM(B4:B12) Your Instructions • Highlight G4 to G12 and click on the decrease decimal places button in the Number menu. Reduce the decimals to 2 places. • Sum the columns using the sum formula in the “totals” row (row 13)
  • 42. Instructions To sum a column of numbers – Click the first empty cell below the column of numbers to sum – Click the AutoSum button on the HOME tab to display a formula in the formula bar and in the active cell, for example, =SUM(B4:B12) Your Instructions • Find column totals using the SUM formula in the “Totals” row (row 13) • Highlight G4 to G12 and click on the decrease decimal places button in the Number menu. Reduce the decimals to 1 place.
  • 43. Finding Totals Using ‘SUM’ Your File Should Look Like This Data Preparation • We have carried out a small amount of data preparation. Some of the questions you should answer during this stage: • Have you considered your data storage and maintenance capacity? e.g. Do you need new software, cloud warehousing or just a PC? • Do you need to transform (data wrangling) or integrate
  • 44. the data in any way? e.g. Finding total cost, reducing numbers to one decimal Stage 4: Modelling Questions for the modelling phase: • What models will you use? e.g. Descriptive, predictive analytics or AI techniques • How will you train/test and assess the models? e.g. You will need a training data set if you are going to use machine learning Let’s look at some basic summary statistics, average, max and min.
  • 45. https://www.sv-europe.com/crisp-dm-methodology// https://www.sv-europe.com/crisp-dm-methodology/ Absolute Versus Relative Addressing Table 3-6 Examples of Absolute, Relative, and Mixed Cell References Cell Reference Type of Reference Meaning $B$4 Absolute cell reference Both column and row references remain the same when you copy this cell, because the cell references are absolute B4 Relative cell reference Both column and row references are relative. When copied to another cell, both the column and row in the
  • 46. cell reference are adjusted to reflect the new location B$4 Mixed reference This cell reference is mixed. The column reference changes when you copy this cell to another column because it is relative. The row reference does not change because it is absolute $B4 Mixed reference This cell reference is mixed. The column reference does not change because it is absolute. The row reference changes when you copy this cell reference to another row because it is relative Absolute Versus Relative Address
  • 47. Instructions To enter a formula containing absolute cell references – Given a selected cell, enter the formula and then press the F4 key to change the most recently typed cell reference from a relative cell reference to an absolute cell reference Your Instructions – Go to cell A17 in your spreadsheet and type in “9” – Calculate the average using each total divided by 9 using absolute referencing: Go to cell B14 and type in “=B13/$A$17” – Apply this to cells C14 to G14 using the fill handle and adjust the results to 2 decimal places
  • 48. Find Max and Min Instructions To find the maximum or minimum of a range of cells type in =max(start cell:end cell) for maximum =min(start cell:end cell) for minimum Your Instructions • Fill in the “Highest” and “Lowest” column values in row 15 and 16, using =max(B4:B12) and =min(B4:B12) • If required, change all values so that 2 decimal places are displayed
  • 49. Your File Should Look Like This Stage 5: Evaluation Questions regarding the evaluation and deployment phases: • How will you assess the results in terms of business success criteria? e.g. How are these results going to help the bus company? • Have you reviewed all the modelling so far? e.g. What other preliminary models we can learn from? See evaluation activity on the next page Activity 1:Evaluation
  • 50. • Answer the following questions: 1. Which bus costs the most to run per km? 2. Which bus has (lowest mileage) driven the least number of kilometres? 3. What is the lowest maintenance cost? Stage 5: Deployment Questions regarding the deployment phases: • Next steps? Do you need to gather more data, carry out another data mining project, or start deployment? e.g. let’s try filtering and sorting values of interest (see next page)
  • 51. • How will you implement your findings? e.g. find a way to reduce the cost of bus 701 Filtering Instructions Filtering • The editing menu has sort and filter commands • To filter items based on a particular column: click on the column to be filtered • Move your mouse to the editing menu and select filter a small box with an arrow will appear at the top of the column • Clicking on the arrow reveal the items in the list • Unticking individual boxes hides (filters out those items) and the filter box changes shape • This command is good for removing BLANKS in data sets
  • 52. • To display (unfilter) the list click on “select all” in the list of filter boxes, and your original data should be displayed Filtering We want to filter out the costs per Km less than 1.80 Your Instructions 1. Click on the top of column B of your spreadsheet 2. Take the cursor to the editing menu and select filter 3. Click on the filter box in column B to reveal the details of data in column B 4. Untick the boxes with values lower than 1.80 to hide
  • 53. them Notice that the row with the minimum is now hidden too Filtered column Your File Should Look Like This We want to sort the mileage costs while keeping the other row information consistent with those costs Your Instructions 1. First unfilter column B by ticking the “select all” box in the filter options
  • 54. 2. Copy just the values of data block from cell A3 to G12 to Sheet 1 by highlighting the data and using control C 3. Click on cell A1 in sheet 1, right click on your mouse and select paste special, click on the values option and OK 4. Select column E, go to the sort menu and select “sort smallest to largest” The “Expand selection” box will appear, make sure the expand selection option is checked and then press SORT Sorting Sorting Your Worksheet Sheet 1 should look like this
  • 55. Total Cost per bus as a Percentage of the Entire Total Cost B701 15% B702 12% B703 11% B704 11% B705 7% B706 10% B707
  • 56. 11% B708 11% B709 12% TOTAL COST PER BUS AS A PERCENTAGE OF SUM TOTAL COST An alternative representation of total costs Activity 2: Interpretation Answer these questions: 1. Which bus has the greatest mileage cost? 2. What is the maintenance cost of the bus of interest in question 1?
  • 57. 3. Is it easier to interpret the table or pie chart? Why? BUS105 Business Information Systems Lesson week 8 Semi-structured and unstructured data management Lesson Learning Outcomes 1 Define semi-structured and unstructured data
  • 58. 2 Distinguish between the various NoSQL and NewSQL databases 3 Learn about various software packages for the management of semi-structured and unstructured data 4 Evaluate case studies 5 Final discussion with your teacher of individual report Dark analytics: Analyzing unstructured data Did you know that 95% of data in the world is unstructured? Watch the video on Dark Analytics
  • 59. https://www.youtube.com/watch?v=X4f-GCGraXI What sorts of data is really difficult to analyse? https://www.youtube.com/watch?v=X4f-GCGraXI Glossary 1 LO1 Recall that • Semi-structured data has some structure - e.g. CSV files with comma separated data. XML & JavaScript Object Notation, JSON, documents used to exchange data to/from a web server. **** some analysts do consider .csv files as structured data • Unstructured data has no predefined data model not organised, may have multiple types of data
  • 60. - e.g. data from thermostats, sensors, home electronic devices, cars, images and sounds & pdf files. EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley & Sons, Indianapolis, US. https://www.google.com/search?q=parsing+definition&ie=&oe= https://en.wikipedia.org/wiki/JSON Glossary 3 LO1 • Quasi-structured data textual data which has various formats and takes effort to handle and analyse – e.g. web clickstream data
  • 61. • Unstructured data has no predefined data model not organised, may have multiple types of data - e.g. data from thermostats, sensors, home electronic devices, cars, images and sounds & pdf files. EMC Education Services (Eds.) 2015, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley & Sons, Indianapolis, US. https://commons.wikimedia.org/wiki/Ne odythemis_hildebrandti Why we need non-relational databases? • Big data has driven the need for
  • 62. • NoSQL databases – For unstructured data • NewSQL databases – Bridging the gap between relational and NoSQL database design • Note: Querying language/method depends on the database used This Photo by Unknown Author is licensed under CC BY-NC http://www.ksi.mff.cuni.cz/ https://creativecommons.org/licenses/by-nc/3.0/ Recall: NoSQL Databases NoSQL (Not only SQL), i.e.Non-relational databases Are used to manage unstructured & semi• -structured data
  • 63. Sometimes called • “Cloud” databases • Usually open source Work on a distributed (parallel) data approach• General categories of non• -relational databases (DBs): – Key-value DBs, e.g. shopping cart, sensor data – Document DBs, e.g. tweets, customer data, blog posts – column-oriented DBs, e.g. time series, banking – Graph DBs, e.g. networks, social connections Coronel, C, and Morris, S 2019, Database Systems: Design, Implementation, & Management, 13th Edn.,Cengage, Boston, USA. Activity 1:Match database type and
  • 65. Networks Time series Sensor data Banking Blog posts Social connections Example of Key-Value Database For example, student names and ages. The name is used as the key. Software Windows Azure• Riak•
  • 66. Redis• Dynamo• https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction- of-nosql- database/ https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction- of-nosql-database/ Example: Document Database • For example, student names, ages & salaries • Each document has a unique key for searching • Documents appear as JavaScript Object Notation (JSON) files (semi-structured) Software
  • 67. • MongoDB • RavenDb • CouchDB • OrientDB h tt p s :/ /w w w .c -s h a rp
  • 70. ta b a s e / https://www.c-sharpcorner.com/UploadFile/f0b2ed/introduction- of-nosql-database/ Example: JSON code JSON format code examples that could be used to exchange data to or from a web server: {“name”: “John”, “age”:30, “Car”: “Ford” } {“StreetNum”: 5, “streetName”:”King William”, “Lanes”: 4} KEY VALUE colon (:) curly brace 1. JSON objects are surrounded by curly braces {},
  • 71. 2. They are written in key & value pairs. 3. Keys must be strings, and values must be a valid JSON data type (string, i.e. text), number, object, array, boolean or null). 4. Keys and values are separated by a colon. 5. Each key/value pair is separated by a comma. Javascript: JSON and Ajax, 1998 -2014 O’Reilly Media, Inc. available at archive.oreilly.com/oreillyschool/courses/javascript2/Javascript %20JSON%20and%20Ajax%20v2.pdf This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License. https://en.wikipedia.org/wiki/JSON Activity 2: JSON code
  • 72. • Why are these incorrectly coded? 1. (“name”: “John”, “age”:30, “Car”: “Ford” ) 2. {name: “John”, age:30, Car: “Ford” } 3. {“name”: “age”:30, “Car”: “Ford” } 4. {“name”: “John”, “age”:30, [Car]: [Ford] } 5. {“name”: “John” “age”:30 “Car”: “Ford” } More about MongoDB • A document database • Documents do not have to conform to the same structure (schema-less) • Documents with similar types are stored in
  • 73. collections, related collections are stored in a DB • The documents appear as JSON files to users Coronel, C, and Morris, S 2019, Database Systems: Design, Implementation, & Management, 13th Edn.,Cengage, Boston, USA. Example: Column-Oriented Database • Same example in a row store (relational) and column (non-relational). Software, Cassandra and HBase Relational Table Column-centric storage Block 1 | 125670,145679,234466,785940,785840
  • 74. Block 2 | Ma,Jimmy,Peter,Sundar,Jiping Block 3 | 130, 128 144, 132, 110 Block 4 | 85,78,88,82,70 Activity 3: Column-Oriented Database • Convert the subset of data from the week 7 excel file (shown below) into column-centric format Relational Table Column- centric storage Block 1 | Block 2 |
  • 75. Block 3 | Block 4 | Case study: Fraud detection using a Graph Database • Neo4j video on Fraud detection • Watch the video and learn about graph database design https://www.youtube.com/watch?v=ujimD6MP87I https://www.youtube.com/watch?v=ujimD6MP87I Aggregate awareness • Aggregate awareness means that the data is grouped (or “aggregated”) around a central topic
  • 76. • For example, data collected in connection with an individual blog post, including – Title, content, date posted – Username, screen name – Comments made on the post, etc • Key value, document and column DBs are all aggregate aware Coronel, C, and Morris, S 2019, Database Systems: Design, Implementation, & Management, 13th Edn.,Cengage, Boston, USA. NewSQL Databases • Cloud-based to handle large amounts of data • E.g. ClustrixDB, NuoDB
  • 77. • Use SQL for queries • Use massively parallel query processing (MPP) , i.e. data across multiple servers which process the data locally • Key-value and column-oriented data stores Case Study: Hit Labs ClustrixDB customer success story • Application: Hit Labs created the Bubble Group Messenger App (for group messaging and group chat) • It is free on iOS and Android devices • Originally built on Amazon's Aurora • Problem: Hit Labs wanted a database to support their rapid user growth
  • 78.