SlideShare a Scribd company logo
1 of 37
Download to read offline
Big Data & DS Analytics
for PAARL
Albert Anthony D. Gavino, MBA
Data Scientist / DS Evangelist
About the speaker: Albert Anthony D. Gavino
Project profile
Program Objectives / Program Goals
Participants to be able to relate Big Data and Data Science
applications to Library services.
1. What is Big Data?
Extremely large data sets that may be analyzed to reveal patterns,
trends and associations
The BIG 3 V’s
• Variety: different types of data
(Facebook, Twitter, CCTV feed)
• Velocity: the speed that data comes in
(batch, streaming every second)
• Volume: the largeness of that data.
(1GB, 1TB, 1PB, 1ZB)
Library Data Resources
What resources does the library have (budget, staff, premises, media, opening
hours etc.) and how is the library performing against traditional parameters, like
lending figures, visitors and social media activity? This library data can also be
combined with environmental information like community education levels,
geographical distances, age and so on.
http://www.axiell.co.uk/gettingthemostfromyourlibrarydata/
DATA Analytics Challenges and Pitfalls
The challenges to creating a robust institutional data analytics program include
culture, talent, cost, and data. We have deliberately mentioned culture first
because it is very easy to jump to data challenges. In fact, most of the literature
surrounding data analytics starts with challenges surrounding the data itself.
However, we are convinced that institutional culture is the most important factor
in determining the success of any given data analytics program, including the
politics and process around questions of talent, cost, and data itself.
Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries:
Challenges and Opportunities
63% of researchers and administrators expressed unhappiness with the use
of metrics in higher education (Abbott et al., 2010)
What about New Tasks like streamlining for the Librarian?
If librarians take on new tasks, it is very important to track the amount of time and level of staff
required when undertaking analytics projects. For example, collecting citation data for a
researcher with a common name often requires manual and painstaking record-by-record
searching in order to disambiguate that individual's research from others that share his/her
name. This type of work requires a librarian with a deep and intimate knowledge of the
bibliometric databases that are being used to harvest the bibliometric data.
Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries:
Challenges and Opportunities
What is the Cost?
• Data analytics should be thought of as a strategic investment,
not a cost-saving technique
• the real cost is the time spent on cultural change and on
developing and educating a staff with the analytical skills that we
need in our discipline
• visionary analytics plan invests in people, in hiring and training,
over data tools and platforms.
.
Pitfalls of Data Sharing:
Challenges on Institutional Data Analytics
Pitfalls Possible Solution/s
Ownership: who owns the data? It
could be registrar, library, IT
services.
An assigned office e.g. or Office of
the President/ Compliance Office
can release the official reports.
Quality: deciding when it is accurate
or good data, data reliability.
Data Governance Unit assures the
quality of data
Standards: what kind of data
variables are in use: string, numeric
This can be addressed by Data
Management on data warehousing
Access: who has access to the data User roles can be defined as to who
has access
Getting Started on Institutional Data
• Creating an inventory of institutional data
• Developing a data dictionary
• Designing an unambiguous process for cleaning up those data
• Creating an open data set that answers to the most commonly asked data
questions across campus.
Opportunities for Libraries on Big Data
• Libraries know metadata
• Libraries know strategy
• Libraries know assessment
• Libraries are neutral
• Libraries know the vendors
• Libraries are part of larger bodies like PAARL
• Libraries have influence over campuses
• Libraries know metrics
• Libraries have user-centered culture
• Libraries know the vendors
• Libraries know the politics and policy issues with commercial parties
• Libraries collaborate with both academic and academic support
2. Building a BIG DATA culture
• Openness and acceptance to technology: Upper Management
• Willingness to invest in the Big Data Platform: which entails cost
• Training Staff and making sure of job security: Skills upgrade
• Make data sharing acceptable: Trust in the data quality and people
• Create Data Quality Assurance Team/s
• Foster collaboration among departments
• Continuous improvement of models
DATA Governance and DATA Management are different roles
Data governance is the designation of decision-rights and policy-making surrounding institutional data,
while data management is the implementation of those decisions and policies. Institutions need both,
and both require investment, but the senior leadership of our institutions need to design the former.
Data Governance CouncilData Governance Council
Data ManagementData Management
policiespolicies
metricsmetrics
Data Quality DeptData Quality Dept
Data Warehouse / Data
Lake
Data Warehouse / Data
Lake
Machine Learning
Is a type of artificial intelligence that provides
computers with the ability to learn without being
explicitly programmed.
Market Basket Analysis on Book Recommendations (Association Rule Algorithm)
Weather related information and reading a book (use of hash tags and location and weather data)
Pic from Marco Rasos
Social Listening – is the process of monitoring digital conversations to
understand what customers are saying about a brand or service.
Online Research Journals and Click through Rates
Click through Rates (CTR)
Ratio of users who click on a specific
link to get to a page from a page ad or
button.
OpenCV (Open Source and Computer Vision)
Modern Day Data Scientists
Dr. Reina Reyes, Astrophysicist
Andrew Ng of Baidu, Coursera
Amy Smith, Uber Singapore
Data Science Conference 2016
YOU as the next
Doctor Strange
(Entering the world of
Data Science)
Isaac Reyes, Data Scientist Talas Data Scientists
CRISP – DM Methodology
The project was led by
five companies: SPSS,
Teradata, Daimler AG,
NCR Corporation and
OHRA, an insurance
company
The project was led by
five companies: SPSS,
Teradata, Daimler AG,
NCR Corporation and
OHRA, an insurance
company
CRISP-DM Tasks
From regular data to BIG data, from stat to AI
RegulardataBIGdata
Statistical modeling
Machine Learning
Deep Learning / A.I.
Traditional Modern
Trends in Data Science Domains
Data Science Domain Current Status
Natural Language
Processing (NLP)
Entered the market
Predictive Analytics /
Machine Learning
Entered the market
Visualization /
Dashboards
Entered the market
Image Processing
(openCV)
Exploration
Internet of Things (IoT) Exploration
Artificial Intelligence Exploration
DS/Big Data Applications to the field of Study
Agriculture Climate forecast modeling to help farmers
manage plantations (e.g. corn yields)
Medical field Image processing for chest x rays,
retina images for diabetic patients
Linguistics Natural Language Processing (NLP) for
dialects and Sentiment Analysis applications
Economics/Finance Predicting a stock price based on certain
indicators (e.g. noise, competitor price)
Sample Field of Study Specific Applications
Engineering Internet of Things (IoT) application to Big Data
Building a Data Science Team
Data ScientistData Engineer/
Dev Ops
Statistician Viz Expert
R,
Python,
Spark ML
Hadoop,
Spark Core,
Spark stream
SAS,
SPSS,
R, Matlab
Tableau, Cognos
D3, Javascript
Neural Nets
Random Forest
RDD, dataframes,
SQLContext
Linear Regression
K-means clustering
visualization
GIS maps
DS
role
Prog
Language
Sample
output
Data Science Team Composition
11 22 33
Trends on Programming Languages
scalaR
python
spark Rapid miner EMC
java
TOOLS: OPEN SOURCE vs PROPRIETARY SOFTWARE
OPEN SOURCE PROPRIETARY
SOFTWARE
pros No cost on software, packages are
available faster
Easy to deploy
cons Takes some time to create and
integrate with other software
Expensive software,
you have do buy in
modules
tools Python, R, Apache Spark SAS, IBM-SPSS,
AWS, Google
Small Data vs Big Data (in comparison)
Small data Big data
Sample size can be done
(sampling e.g. survey)
Use all of the data in the
storage
No need for memory computing,
can be run on a regular PC/Mac
Eats up memory and needs
distributed computing
Statistical assumptions hold
true,
normality, heteroskedasticity
independence
Statistical assumptions do not
hold true like p-values since the
data is so large (what seems
not significant to small sets will
become significant, be careful
when using these assumptions)
Simple DS Cheat sheet
Classifiers
Neural Nets
Random forest
Clustering
K-means
Association
Assoc Rules
Predicting
Linear
Regression
Logistic
Regression
(binary)
Cox Regression
(Survival)
Hierarchical
Clustering
SVM (Cancer
Cells)
Medical
Vizualization TOOLS
Color Hues and Functionality
Local Implications: Data Privacy Act 10173
Sensitive personal information refers to personal information:
1. About an individual’s race, ethnic origin, marital status, age, color, and religious, philosophical or
political affiliations;
2. About an individual’s health, education, genetic or sexual life of a person, or to any proceeding for
any offense committed or alleged to have been committed by such individual, the disposal of such
proceedings, or the sentence of any court in such proceedings;
3. Issued by government agencies peculiar to an individual which includes, but is not limited to, social
security numbers, previous or current health records, licenses or its denials, suspension or
revocation, and tax returns; and
4. Specifically established by an executive order or an act of Congress to be kept classified.
Solutions to the Data Privacy Act: Policies
Make sure you have the following in place
•Opt In for customers
•Opt out for customers
•Updated your customer policy accordingly
•Make your policy available publicly e.g. websites
References
• www.coursera.org/learn/machine-learning
• www.kaggle.com
• www.crowdanalytix.com
• www.talas.ph
• www.facebook.com/analytics4pinoys
• www.linkedin.com/albertgavino

More Related Content

What's hot

Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification303Computing
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger databodaceacat
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayXoriant Corporation
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzaniaijsrd.com
 

What's hot (20)

Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Data mining
Data miningData mining
Data mining
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Data science
Data science Data science
Data science
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Digital data
Digital dataDigital data
Digital data
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
 

Viewers also liked

Byaheng Wow libraries, philippines 2017
Byaheng Wow libraries, philippines 2017Byaheng Wow libraries, philippines 2017
Byaheng Wow libraries, philippines 2017Roderick Baturi Ramos
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Peter Löwe
 
CUST-1 Share Document Library Extension Points
CUST-1 Share Document Library Extension PointsCUST-1 Share Document Library Extension Points
CUST-1 Share Document Library Extension PointsAlfresco Software
 
A theoretical approach to accreditation of Open Education
A theoretical approach to accreditation of Open EducationA theoretical approach to accreditation of Open Education
A theoretical approach to accreditation of Open EducationStian Håklev
 
Ramos, Roderick and cv as of February 23, 2017
Ramos, Roderick and cv as of February 23, 2017Ramos, Roderick and cv as of February 23, 2017
Ramos, Roderick and cv as of February 23, 2017Roderick Baturi Ramos
 
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & Travels
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & TravelsWOW LIBRARIES REPEAT! May 19 Summer Library Tours & Travels
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & TravelsRoderick Baturi Ramos
 
e-book available now: Being chief & confidently able with a heart! By Roderic...
e-book available now: Being chief & confidently able with a heart! By Roderic...e-book available now: Being chief & confidently able with a heart! By Roderic...
e-book available now: Being chief & confidently able with a heart! By Roderic...Roderick Baturi Ramos
 

Viewers also liked (20)

Libraries and the Internet of Things
Libraries and the Internet of ThingsLibraries and the Internet of Things
Libraries and the Internet of Things
 
Philippine Libraries in Transformation (Summer Conference)
Philippine Libraries in Transformation (Summer Conference)Philippine Libraries in Transformation (Summer Conference)
Philippine Libraries in Transformation (Summer Conference)
 
PAARL Summer Conference 2017 Call for papers
PAARL Summer Conference 2017 Call for papers  PAARL Summer Conference 2017 Call for papers
PAARL Summer Conference 2017 Call for papers
 
"One MIL a Day Keeps the (IL) Literate Away"
"One MIL a Day Keeps the (IL) Literate Away""One MIL a Day Keeps the (IL) Literate Away"
"One MIL a Day Keeps the (IL) Literate Away"
 
Paarl Calendar of Activities for 2016
Paarl Calendar of Activities for 2016Paarl Calendar of Activities for 2016
Paarl Calendar of Activities for 2016
 
Byaheng Wow libraries, philippines 2017
Byaheng Wow libraries, philippines 2017Byaheng Wow libraries, philippines 2017
Byaheng Wow libraries, philippines 2017
 
Enhancing writing skills for librarians and information professionals
Enhancing writing skills for librarians and information professionalsEnhancing writing skills for librarians and information professionals
Enhancing writing skills for librarians and information professionals
 
Paarl newsletter 2014 (october - december)
Paarl newsletter 2014 (october - december)Paarl newsletter 2014 (october - december)
Paarl newsletter 2014 (october - december)
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
CUST-1 Share Document Library Extension Points
CUST-1 Share Document Library Extension PointsCUST-1 Share Document Library Extension Points
CUST-1 Share Document Library Extension Points
 
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
 
A theoretical approach to accreditation of Open Education
A theoretical approach to accreditation of Open EducationA theoretical approach to accreditation of Open Education
A theoretical approach to accreditation of Open Education
 
Library Analytics: an Overview
Library Analytics: an OverviewLibrary Analytics: an Overview
Library Analytics: an Overview
 
Collaborative Benchmarking Plus 1: The Amazing Bangkok, Thailand Experience
Collaborative Benchmarking Plus 1: The Amazing Bangkok, Thailand ExperienceCollaborative Benchmarking Plus 1: The Amazing Bangkok, Thailand Experience
Collaborative Benchmarking Plus 1: The Amazing Bangkok, Thailand Experience
 
Paarl calendar of activities 2015
Paarl calendar of activities 2015 Paarl calendar of activities 2015
Paarl calendar of activities 2015
 
Library congress guam 2 (1)
Library congress guam 2 (1)Library congress guam 2 (1)
Library congress guam 2 (1)
 
Ramos, Roderick and cv as of February 23, 2017
Ramos, Roderick and cv as of February 23, 2017Ramos, Roderick and cv as of February 23, 2017
Ramos, Roderick and cv as of February 23, 2017
 
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & Travels
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & TravelsWOW LIBRARIES REPEAT! May 19 Summer Library Tours & Travels
WOW LIBRARIES REPEAT! May 19 Summer Library Tours & Travels
 
e-book available now: Being chief & confidently able with a heart! By Roderic...
e-book available now: Being chief & confidently able with a heart! By Roderic...e-book available now: Being chief & confidently able with a heart! By Roderic...
e-book available now: Being chief & confidently able with a heart! By Roderic...
 
Paarl newsletter 2015 (oct dec)
Paarl newsletter 2015 (oct dec)Paarl newsletter 2015 (oct dec)
Paarl newsletter 2015 (oct dec)
 

Similar to Big Data & DS Analytics for PAARL

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptxShambhavi Vats
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
 

Similar to Big Data & DS Analytics for PAARL (20)

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
BigData
BigDataBigData
BigData
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptx
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 

More from Philippine Association of Academic/Research Librarians

More from Philippine Association of Academic/Research Librarians (20)

IRR of CPD Act of 2016
IRR of CPD Act of 2016IRR of CPD Act of 2016
IRR of CPD Act of 2016
 
Paarl newsletter 2016 (Jan-Mar)
Paarl newsletter 2016 (Jan-Mar)Paarl newsletter 2016 (Jan-Mar)
Paarl newsletter 2016 (Jan-Mar)
 
PAARL Awards and Scholarship program 2016
PAARL Awards and Scholarship program 2016PAARL Awards and Scholarship program 2016
PAARL Awards and Scholarship program 2016
 
Recognizing Best Researches: a Colloquium
Recognizing Best Researches: a ColloquiumRecognizing Best Researches: a Colloquium
Recognizing Best Researches: a Colloquium
 
Demonstrating the library's impact through assessment and evaluation
Demonstrating the library's impact through assessment and evaluationDemonstrating the library's impact through assessment and evaluation
Demonstrating the library's impact through assessment and evaluation
 
Building a library disaster preparedness plan
Building a library disaster preparedness planBuilding a library disaster preparedness plan
Building a library disaster preparedness plan
 
Reengineering library services
Reengineering library servicesReengineering library services
Reengineering library services
 
Information literacy and the role of academic libraries
Information literacy and the role of academic librariesInformation literacy and the role of academic libraries
Information literacy and the role of academic libraries
 
Financial Management in Libraries
Financial Management in LibrariesFinancial Management in Libraries
Financial Management in Libraries
 
Dynamic Leadership and Management of Libraries/Learning Commons
Dynamic Leadership and Management of Libraries/Learning CommonsDynamic Leadership and Management of Libraries/Learning Commons
Dynamic Leadership and Management of Libraries/Learning Commons
 
Collection management
Collection management Collection management
Collection management
 
The DLSU Libraries Engineering Collection
The DLSU Libraries Engineering CollectionThe DLSU Libraries Engineering Collection
The DLSU Libraries Engineering Collection
 
Use equals value: Use Analysis of the DLSU Business and Economics Collection
Use equals value: Use Analysis of the DLSU Business and Economics CollectionUse equals value: Use Analysis of the DLSU Business and Economics Collection
Use equals value: Use Analysis of the DLSU Business and Economics Collection
 
The 80/20 Rule: Analysis of Factors That Contribute to Print Book Utilization
The 80/20 Rule: Analysis of Factors That Contribute to Print Book UtilizationThe 80/20 Rule: Analysis of Factors That Contribute to Print Book Utilization
The 80/20 Rule: Analysis of Factors That Contribute to Print Book Utilization
 
Collection assessment using modified brief test method
Collection assessment using modified brief test methodCollection assessment using modified brief test method
Collection assessment using modified brief test method
 
Step-by-step guide to travel visa application for Taiwan
Step-by-step guide to travel visa application for TaiwanStep-by-step guide to travel visa application for Taiwan
Step-by-step guide to travel visa application for Taiwan
 
E-Metrics: Assessing Electronic Resources
E-Metrics: Assessing Electronic ResourcesE-Metrics: Assessing Electronic Resources
E-Metrics: Assessing Electronic Resources
 
Enchanting Taiwan - Itinerary
Enchanting Taiwan - ItineraryEnchanting Taiwan - Itinerary
Enchanting Taiwan - Itinerary
 
Doing Case Study Research
Doing Case Study ResearchDoing Case Study Research
Doing Case Study Research
 
Keeping them posted: Analyzing library web content and user engagement
Keeping them posted: Analyzing library web content and user engagementKeeping them posted: Analyzing library web content and user engagement
Keeping them posted: Analyzing library web content and user engagement
 

Recently uploaded

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Recently uploaded (20)

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Big Data & DS Analytics for PAARL

  • 1. Big Data & DS Analytics for PAARL Albert Anthony D. Gavino, MBA Data Scientist / DS Evangelist
  • 2. About the speaker: Albert Anthony D. Gavino
  • 4. Program Objectives / Program Goals Participants to be able to relate Big Data and Data Science applications to Library services.
  • 5. 1. What is Big Data? Extremely large data sets that may be analyzed to reveal patterns, trends and associations
  • 6. The BIG 3 V’s • Variety: different types of data (Facebook, Twitter, CCTV feed) • Velocity: the speed that data comes in (batch, streaming every second) • Volume: the largeness of that data. (1GB, 1TB, 1PB, 1ZB)
  • 7. Library Data Resources What resources does the library have (budget, staff, premises, media, opening hours etc.) and how is the library performing against traditional parameters, like lending figures, visitors and social media activity? This library data can also be combined with environmental information like community education levels, geographical distances, age and so on. http://www.axiell.co.uk/gettingthemostfromyourlibrarydata/
  • 8. DATA Analytics Challenges and Pitfalls The challenges to creating a robust institutional data analytics program include culture, talent, cost, and data. We have deliberately mentioned culture first because it is very easy to jump to data challenges. In fact, most of the literature surrounding data analytics starts with challenges surrounding the data itself. However, we are convinced that institutional culture is the most important factor in determining the success of any given data analytics program, including the politics and process around questions of talent, cost, and data itself. Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries: Challenges and Opportunities 63% of researchers and administrators expressed unhappiness with the use of metrics in higher education (Abbott et al., 2010)
  • 9. What about New Tasks like streamlining for the Librarian? If librarians take on new tasks, it is very important to track the amount of time and level of staff required when undertaking analytics projects. For example, collecting citation data for a researcher with a common name often requires manual and painstaking record-by-record searching in order to disambiguate that individual's research from others that share his/her name. This type of work requires a librarian with a deep and intimate knowledge of the bibliometric databases that are being used to harvest the bibliometric data. Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries: Challenges and Opportunities
  • 10. What is the Cost? • Data analytics should be thought of as a strategic investment, not a cost-saving technique • the real cost is the time spent on cultural change and on developing and educating a staff with the analytical skills that we need in our discipline • visionary analytics plan invests in people, in hiring and training, over data tools and platforms. .
  • 11. Pitfalls of Data Sharing: Challenges on Institutional Data Analytics Pitfalls Possible Solution/s Ownership: who owns the data? It could be registrar, library, IT services. An assigned office e.g. or Office of the President/ Compliance Office can release the official reports. Quality: deciding when it is accurate or good data, data reliability. Data Governance Unit assures the quality of data Standards: what kind of data variables are in use: string, numeric This can be addressed by Data Management on data warehousing Access: who has access to the data User roles can be defined as to who has access
  • 12. Getting Started on Institutional Data • Creating an inventory of institutional data • Developing a data dictionary • Designing an unambiguous process for cleaning up those data • Creating an open data set that answers to the most commonly asked data questions across campus.
  • 13. Opportunities for Libraries on Big Data • Libraries know metadata • Libraries know strategy • Libraries know assessment • Libraries are neutral • Libraries know the vendors • Libraries are part of larger bodies like PAARL • Libraries have influence over campuses • Libraries know metrics • Libraries have user-centered culture • Libraries know the vendors • Libraries know the politics and policy issues with commercial parties • Libraries collaborate with both academic and academic support
  • 14. 2. Building a BIG DATA culture • Openness and acceptance to technology: Upper Management • Willingness to invest in the Big Data Platform: which entails cost • Training Staff and making sure of job security: Skills upgrade • Make data sharing acceptable: Trust in the data quality and people • Create Data Quality Assurance Team/s • Foster collaboration among departments • Continuous improvement of models
  • 15. DATA Governance and DATA Management are different roles Data governance is the designation of decision-rights and policy-making surrounding institutional data, while data management is the implementation of those decisions and policies. Institutions need both, and both require investment, but the senior leadership of our institutions need to design the former. Data Governance CouncilData Governance Council Data ManagementData Management policiespolicies metricsmetrics Data Quality DeptData Quality Dept Data Warehouse / Data Lake Data Warehouse / Data Lake
  • 16. Machine Learning Is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed.
  • 17. Market Basket Analysis on Book Recommendations (Association Rule Algorithm)
  • 18. Weather related information and reading a book (use of hash tags and location and weather data) Pic from Marco Rasos
  • 19. Social Listening – is the process of monitoring digital conversations to understand what customers are saying about a brand or service.
  • 20. Online Research Journals and Click through Rates Click through Rates (CTR) Ratio of users who click on a specific link to get to a page from a page ad or button.
  • 21. OpenCV (Open Source and Computer Vision)
  • 22. Modern Day Data Scientists Dr. Reina Reyes, Astrophysicist Andrew Ng of Baidu, Coursera Amy Smith, Uber Singapore Data Science Conference 2016 YOU as the next Doctor Strange (Entering the world of Data Science) Isaac Reyes, Data Scientist Talas Data Scientists
  • 23. CRISP – DM Methodology The project was led by five companies: SPSS, Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company The project was led by five companies: SPSS, Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company
  • 25. From regular data to BIG data, from stat to AI RegulardataBIGdata Statistical modeling Machine Learning Deep Learning / A.I. Traditional Modern
  • 26. Trends in Data Science Domains Data Science Domain Current Status Natural Language Processing (NLP) Entered the market Predictive Analytics / Machine Learning Entered the market Visualization / Dashboards Entered the market Image Processing (openCV) Exploration Internet of Things (IoT) Exploration Artificial Intelligence Exploration
  • 27. DS/Big Data Applications to the field of Study Agriculture Climate forecast modeling to help farmers manage plantations (e.g. corn yields) Medical field Image processing for chest x rays, retina images for diabetic patients Linguistics Natural Language Processing (NLP) for dialects and Sentiment Analysis applications Economics/Finance Predicting a stock price based on certain indicators (e.g. noise, competitor price) Sample Field of Study Specific Applications Engineering Internet of Things (IoT) application to Big Data
  • 28. Building a Data Science Team Data ScientistData Engineer/ Dev Ops Statistician Viz Expert R, Python, Spark ML Hadoop, Spark Core, Spark stream SAS, SPSS, R, Matlab Tableau, Cognos D3, Javascript Neural Nets Random Forest RDD, dataframes, SQLContext Linear Regression K-means clustering visualization GIS maps DS role Prog Language Sample output Data Science Team Composition 11 22 33
  • 29. Trends on Programming Languages scalaR python spark Rapid miner EMC java
  • 30. TOOLS: OPEN SOURCE vs PROPRIETARY SOFTWARE OPEN SOURCE PROPRIETARY SOFTWARE pros No cost on software, packages are available faster Easy to deploy cons Takes some time to create and integrate with other software Expensive software, you have do buy in modules tools Python, R, Apache Spark SAS, IBM-SPSS, AWS, Google
  • 31. Small Data vs Big Data (in comparison) Small data Big data Sample size can be done (sampling e.g. survey) Use all of the data in the storage No need for memory computing, can be run on a regular PC/Mac Eats up memory and needs distributed computing Statistical assumptions hold true, normality, heteroskedasticity independence Statistical assumptions do not hold true like p-values since the data is so large (what seems not significant to small sets will become significant, be careful when using these assumptions)
  • 32. Simple DS Cheat sheet Classifiers Neural Nets Random forest Clustering K-means Association Assoc Rules Predicting Linear Regression Logistic Regression (binary) Cox Regression (Survival) Hierarchical Clustering SVM (Cancer Cells) Medical
  • 34. Color Hues and Functionality
  • 35. Local Implications: Data Privacy Act 10173 Sensitive personal information refers to personal information: 1. About an individual’s race, ethnic origin, marital status, age, color, and religious, philosophical or political affiliations; 2. About an individual’s health, education, genetic or sexual life of a person, or to any proceeding for any offense committed or alleged to have been committed by such individual, the disposal of such proceedings, or the sentence of any court in such proceedings; 3. Issued by government agencies peculiar to an individual which includes, but is not limited to, social security numbers, previous or current health records, licenses or its denials, suspension or revocation, and tax returns; and 4. Specifically established by an executive order or an act of Congress to be kept classified.
  • 36. Solutions to the Data Privacy Act: Policies Make sure you have the following in place •Opt In for customers •Opt out for customers •Updated your customer policy accordingly •Make your policy available publicly e.g. websites
  • 37. References • www.coursera.org/learn/machine-learning • www.kaggle.com • www.crowdanalytix.com • www.talas.ph • www.facebook.com/analytics4pinoys • www.linkedin.com/albertgavino