ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
Introduction to data science intro,ch(1,2,3)
1. Data science
Data Science
An emerging area of work concerned with the collection,
preparation, analysis ,visualization, management, and
preservation of large collections of information.
1
2. Web page
much of the data in the world is non-numeric and
unstructured.
unstructured means that the data are not arranged in neat
rows and columns. Think of a web page
2
5. Data architect
providing input on how the data would need to be
routed and organized to support the analysis,
visualization, and presentation of the data to the
appropriate people.
5
6. Data acquisition
focuses on how the data are collected, and
importantly , how the data are represented prior
to analysis and presentation.
Tool example :barcode
Different barcodes are used for the same product.
(for example, for different sized boxes of cereal).
6
7. Data analysis
using portions of data (samples) to make
inferences about the larger context, and
visualization of the data by presenting it in tables,
graphs, and even animations.
7
8. Data archiving
Preservation of collected data in a form that
makes it highly reusable ,so "data curation" is
a difficult challenge because it is so hard to
anticipate all of the future uses of the data.
Example(Twitter):
Geocodes : data that shows the geographical location
from which a tweet was sent could be a useful
element to store with the data.
8
9. Learning the application domain
Communicating with data users
Seeing the big picture of a complex system
Knowing how data can be represented
:metadata
Data transformation and analysis
Visualization and presentation
Attention to quality
Ethical reasoning :privacy 9
12. “The fundamental problem of
communication is that of
reproducing at one point either
exactly or approximately a
message selected at another
point”
CLAUDE SHANNON
yes
1
0
No
Maybe01
ASCII
12
13. Identifying Data Problems
Data Science is an applied activity and data scientists
serve the needs and solve the problems of data users.
Hint:
The data scientist may never actually become a
farmer, but if you are going to identify a data problem
that a farmer has, you have to learn to think like a
farmer, to some degree.
3 questions:
subject matter experts.
ask about anomalies
ask about risks and uncertainty
13
14. Introduction To R
R is an integrated suite of software facilities for data
manipulation, calculation , graphical Display and other
things it has .
"R" is an open source software program
an effective data handling and storage facility.
a suite of operators for calculations on arrays, in
particular matrices,
a large, coherent, integrated collection of
intermediate tools for data analysis,
graphical facilities for data analysis and display
either directly at the computer or on hardcopy.
14
15. Additional Pros:
R was among the first analysis programs to
integrate capabilities for drawing data directly from
the Twitter(r) social media platform
The extensibility of R means that new modules are
being added all the time by volunteers
the lessons one learns in working with R are almost
universally applicable to other programs and
environments.
15
16. CONS:
R is "command line" oriented
R is not especially good at giving feedback or error
messages.
16
17. How to write a text
myText <- "this is a piece of text"
Create Data Set :
myFamilyAges <- c(43, 42, 12, 8, 5)
c(): Concatenates data elements together
Assignment arrow: <-
Some mathematical function :
sum():Adds data elements
range():Min value and max value
mean():The average
17