SlideShare a Scribd company logo
1 of 47
Pemanfaatan Big Data dalam
Riset
Lala Septem Riza
Sekolah Pascasarjana
2023
Outlines
1. Pengenalan Data Science
2. Phenomena and Definition of Big Data
3. Platforms, Technology, Tool, dan Method in Big Data Analysis
4. Implementations and Research in Big Data
Introduction to Data Science
• Data science is the study that focuses on knowledge extraction from
data: data collection, preparation, analysis, visualization,
management, recommendation, etc.
• Data science is an interdisciplinary field that requires hacking skills
(i.e., programming), math and statistics knowledge, and substantive
expertise in a field of science.
Processes in Data Science
1. Objectives: asking the right questions
to find what the problem is.
2. Data Collection: Get Relevant Data for
Analysis of the Problem.
3. Data Preprocessing: Explore the Data
to Make Error Corrections (cleaning
and organizing).
4. Computational and Data model:
Descriptive, predictive, etc.
5. Reporting/Dissemination/Publication.
Data Science: Software and
Implementations|4
Final Goals in Data Analysis
1. Decision analytics: supports decision-making with visual analytics
that reflect reasoning.
2. Descriptive analytics: provides insight from historical data with
reporting, score cards, clustering, etc.
3. Predictive analytics: employs predictive modeling using statistical
and machine learning techniques.
4. Prescriptive analytics: recommends decisions using optimization,
simulation, etc.
Data Science: Software and
Implementations|5
Phenomena of Big Data
Volume of data digital 2010 to 2025 (in zettabytes 1021 bytes).
Internet Activities
Big Data
Petabytes: 1015 byte
The Shift of Marketplace
What is Big Data?
1.Volume: The huge amounts of data being
stored.
2.Velocity: The lightning speed at which data
streams must be processed and analyzed.
3.Variety: The different sources and
forms from which data is collected, such as
numbers, text, video, images, audio and text.
9Vs of Big Data Definitions
Historical/Traditional technologies don’t work
because …
Challenges in Big Data
Technology and Method in Big
Data Analysis
The Issues on Big Data Technologies:
1. Computational Models: How the data are
processed and analyzed  Data Analysis/Data
Science
2. Database/storage Frameworks: focuses on
technologies and mechanisms to write, read, and
manage Big Data efficiently. Furthermore,
handling fault tolerance, availability, consistency,
scalability, and heterogeneity of Big Data should
be considered as well
Big Data Platform
Big Data
Platform
Big Data Platforms
• Redundant and Reliable: Platforms can replicates data automatically,
so when machine goes down there is no data loss.
• Runs on commodity hardware: Don’t have to buy special hardware,
expensive RAIDs, or redundant hardware; reliability is built into
software.
• Scale-Out rather than Scale-UP.
• Bring code to data rather than data to code.
• Fault tolerant/Deal with failures.
• Break disk read barrier.
Introduction to Apache Hadoop
Hadoop History Timeline
• In April 2008, Hadoop broke a world record to become the
fastest system to sort an entire terabyte of data. Running on
a 910-node cluster, Hadoop sorted 1 terabyte in 209
seconds (just under 3.5 minutes), beating the previous year’s
winner of 297 seconds.
• In November of the same year, Google reported that its
MapReduce implementation sorted 1 terabyte in 68
seconds.
• Then, in April 2009, it was announced that a team at Yahoo!
had used Hadoop to sort 1 terabyte in 62 seconds.
• In the 2014 competition, a team from Databricks were joint
winners of the Gray Sort benchmark. They used a 207-node
Spark cluster to sort 100 terabytes of data in 1,406 seconds,
a rate of 4.27 terabytes per minute.
Hadoop Version
Hadoop Distributed File Systems (HDFS)
• HDFS is a filesystem designed for storing very large files
with streaming data access patterns, running on clusters
of commodity hardware.
• Very large files: hundreds of megabytes, gigabytes, or terabytes
in size.
• Streaming data access: a write once, read-many-times pattern.
• Commodity hardware: run on clusters of commodity hardware.
• HDFS is not a good fit:
• Low-latency data access
• Lots of small files
Research in Big Data
Implementations of Big Data Analysis
• Google: using Big Data for searching,
recommendation, etc.
• Amazon: Big Data resulted from collecting
customers’ behaviors for recommendation
system.
• Facebook: using Big Data Analysis for image
recognition when tagging, deepfakes, People
You May Know, dll.
Related paper to Big Data
1. Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic
repeats detection using Boyer-Moore algorithm on Apache Spark
Streaming. Telkomnika, 18(2), 783-791.
2. Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education:
a state of the art, limitations, and future research directions.
International Journal of Educational Technology in Higher Education,
17(1), 1-23.
3. Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M.
T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data
Analytical Application. In Proceedings of International Conference on
Data Science and Applications (pp. 759-771). Springer, Singapore.
Big Data in Bioinformatics
Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats
detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika,
18(2), 783-791.
Genomic repeats detection using Boyer-
Moore algorithm on Apache Spark Streaming
• Repetition identification and
classification are important
fundamental annotation tasks
because of the evolution of
genomes and diseases and
distinguish from other gene
types.
• A task of genomic repeats, which
basically is an analysis of string
matching or pattern matching, is
carried out to look for a pattern in
a large text.
Research Objective
• This research is aimed at building a big-data computational model
and implementing the Boyer Moore algorithm in finding string
patterns in human chromosome genome data contained in ensemble
pages.
• Apache Spark is an open-source cluster computing framework for
large data processing.
Research Method in
Genomic Repeats
• 4 working environments:
• In personal computers
• On virtual machines in google cloud
project
• On HDFS
• With apache spark streaming
• Data collection (round 3.9GB):
Human DNA sequences which can
be downloaded freely on page
ftp://ftp.Ensembl.Org/pub/release-
95/fasta/homo_sapiens/dna/.
Results: Speed
Comparisons
Big Data in Education
Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of
the art, limitations, and future research directions. International Journal of
Educational Technology in Higher Education, 17(1), 1-23.
Big data in education
• In the educational realm, a large volume of data is produced through
online courses, teaching and learning activities.
• Academic data can help teachers to analyze their teaching pedagogy
and affect changes according to students’ needs and requirement.
• The large-scale administrative data can play a tremendous role in
managing various educational problems.
• Therefore, it is essential for professionals to understand the
effectiveness of big data in education in order to minimize
educational issues
What research
themes have
been addressed
in educational
studies of big
data?
Roadmap
Big Data in
Education
Student Performance Monitor:
A Big Data Analytical Application
Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... &
Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application.
In Proceedings of International Conference on Data Science and Applications (pp.
759-771). Springer, Singapore.
Objectives
• To analyze Program Learning Outcome (PLO) in Outcome Based
Education (OBE) by using Big Data Analytics.
The outcome-based education (OBE) system
is an educational theory where every part of
the curriculum is centered around outcomes
or goals that a student must accomplish to
successfully complete their program.
Proposed Analysis
Results
Other Example: Data Analysis in Education
Real World Sensor 1
Sensor k
…
…
Non-Text
Data
Text
Data
Joint Mining
of Non-Text
and Text
Predictive
Model
Multiple
Predictors
(Features)
…
Predicted Values
of Real World Variables
Change the World Teacher
Student
Big Data for Education
Scalability
Quality
MOOC
Small Classrooms
“Big Data Technology”
Scalable Intelligent MOOC
Automate grading with machine learning
Automate question answering on forums
Towards
Intelligent
MOOC
Submitted Assignments Graded Assignments
Grade:
93
85
….
Traditional Manual Grading
Proposed Automated Grading
Submitted
Assignments
Clustering Batch
grading
Multi-dimensional Grade Predictor Grade
Verification
Graded
Assignments
Detailed
Grading Results
Improvement Performance &
Behavior Analysis
References
• Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art,
limitations, and future research directions. International Journal of Educational Technology in
Higher Education, 17(1), 1-23.
• Big Data Education System Leaderboard, Universy of Illinios at Urbana-Champaign, The Data and
Information Systems Laboratories,
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2
ahUKEwiNrurljs7yAhUL8HMBHR89Ag8QFnoECAIQAQ&url=http%3A%2F%2Ftimes.cs.uiuc.edu%2F
czhai%2Fpub%2Fbigdata-education-zhai.pptx&usg=AOvVaw30IHA6b1UxmFFK0SXCA5hr
• Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big
Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987.
• Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022).
Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International
Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
• Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-
Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.

More Related Content

Similar to Pemanfaatan Big Data Dalam Riset 2023.pptx

Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxYashiBatra1
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...vinayiqbusiness
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxsa3302
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 

Similar to Pemanfaatan Big Data Dalam Riset 2023.pptx (20)

Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Data management plans
Data management plansData management plans
Data management plans
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 

Recently uploaded

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Pemanfaatan Big Data Dalam Riset 2023.pptx

  • 1. Pemanfaatan Big Data dalam Riset Lala Septem Riza Sekolah Pascasarjana 2023
  • 2. Outlines 1. Pengenalan Data Science 2. Phenomena and Definition of Big Data 3. Platforms, Technology, Tool, dan Method in Big Data Analysis 4. Implementations and Research in Big Data
  • 3. Introduction to Data Science • Data science is the study that focuses on knowledge extraction from data: data collection, preparation, analysis, visualization, management, recommendation, etc. • Data science is an interdisciplinary field that requires hacking skills (i.e., programming), math and statistics knowledge, and substantive expertise in a field of science.
  • 4. Processes in Data Science 1. Objectives: asking the right questions to find what the problem is. 2. Data Collection: Get Relevant Data for Analysis of the Problem. 3. Data Preprocessing: Explore the Data to Make Error Corrections (cleaning and organizing). 4. Computational and Data model: Descriptive, predictive, etc. 5. Reporting/Dissemination/Publication. Data Science: Software and Implementations|4
  • 5. Final Goals in Data Analysis 1. Decision analytics: supports decision-making with visual analytics that reflect reasoning. 2. Descriptive analytics: provides insight from historical data with reporting, score cards, clustering, etc. 3. Predictive analytics: employs predictive modeling using statistical and machine learning techniques. 4. Prescriptive analytics: recommends decisions using optimization, simulation, etc. Data Science: Software and Implementations|5
  • 6. Phenomena of Big Data Volume of data digital 2010 to 2025 (in zettabytes 1021 bytes).
  • 7.
  • 10. The Shift of Marketplace
  • 11. What is Big Data? 1.Volume: The huge amounts of data being stored. 2.Velocity: The lightning speed at which data streams must be processed and analyzed. 3.Variety: The different sources and forms from which data is collected, such as numbers, text, video, images, audio and text.
  • 12. 9Vs of Big Data Definitions
  • 13.
  • 15.
  • 17. Technology and Method in Big Data Analysis
  • 18. The Issues on Big Data Technologies: 1. Computational Models: How the data are processed and analyzed  Data Analysis/Data Science 2. Database/storage Frameworks: focuses on technologies and mechanisms to write, read, and manage Big Data efficiently. Furthermore, handling fault tolerance, availability, consistency, scalability, and heterogeneity of Big Data should be considered as well
  • 21. Big Data Platforms • Redundant and Reliable: Platforms can replicates data automatically, so when machine goes down there is no data loss. • Runs on commodity hardware: Don’t have to buy special hardware, expensive RAIDs, or redundant hardware; reliability is built into software. • Scale-Out rather than Scale-UP. • Bring code to data rather than data to code. • Fault tolerant/Deal with failures. • Break disk read barrier.
  • 24. • In April 2008, Hadoop broke a world record to become the fastest system to sort an entire terabyte of data. Running on a 910-node cluster, Hadoop sorted 1 terabyte in 209 seconds (just under 3.5 minutes), beating the previous year’s winner of 297 seconds. • In November of the same year, Google reported that its MapReduce implementation sorted 1 terabyte in 68 seconds. • Then, in April 2009, it was announced that a team at Yahoo! had used Hadoop to sort 1 terabyte in 62 seconds. • In the 2014 competition, a team from Databricks were joint winners of the Gray Sort benchmark. They used a 207-node Spark cluster to sort 100 terabytes of data in 1,406 seconds, a rate of 4.27 terabytes per minute.
  • 25.
  • 27. Hadoop Distributed File Systems (HDFS) • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. • Very large files: hundreds of megabytes, gigabytes, or terabytes in size. • Streaming data access: a write once, read-many-times pattern. • Commodity hardware: run on clusters of commodity hardware. • HDFS is not a good fit: • Low-latency data access • Lots of small files
  • 29. Implementations of Big Data Analysis • Google: using Big Data for searching, recommendation, etc. • Amazon: Big Data resulted from collecting customers’ behaviors for recommendation system. • Facebook: using Big Data Analysis for image recognition when tagging, deepfakes, People You May Know, dll.
  • 30. Related paper to Big Data 1. Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791. 2. Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23. 3. Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
  • 31. Big Data in Bioinformatics Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.
  • 32. Genomic repeats detection using Boyer- Moore algorithm on Apache Spark Streaming • Repetition identification and classification are important fundamental annotation tasks because of the evolution of genomes and diseases and distinguish from other gene types. • A task of genomic repeats, which basically is an analysis of string matching or pattern matching, is carried out to look for a pattern in a large text.
  • 33. Research Objective • This research is aimed at building a big-data computational model and implementing the Boyer Moore algorithm in finding string patterns in human chromosome genome data contained in ensemble pages. • Apache Spark is an open-source cluster computing framework for large data processing.
  • 34. Research Method in Genomic Repeats • 4 working environments: • In personal computers • On virtual machines in google cloud project • On HDFS • With apache spark streaming • Data collection (round 3.9GB): Human DNA sequences which can be downloaded freely on page ftp://ftp.Ensembl.Org/pub/release- 95/fasta/homo_sapiens/dna/.
  • 36. Big Data in Education Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23.
  • 37. Big data in education • In the educational realm, a large volume of data is produced through online courses, teaching and learning activities. • Academic data can help teachers to analyze their teaching pedagogy and affect changes according to students’ needs and requirement. • The large-scale administrative data can play a tremendous role in managing various educational problems. • Therefore, it is essential for professionals to understand the effectiveness of big data in education in order to minimize educational issues
  • 38. What research themes have been addressed in educational studies of big data?
  • 40. Student Performance Monitor: A Big Data Analytical Application Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
  • 41. Objectives • To analyze Program Learning Outcome (PLO) in Outcome Based Education (OBE) by using Big Data Analytics. The outcome-based education (OBE) system is an educational theory where every part of the curriculum is centered around outcomes or goals that a student must accomplish to successfully complete their program.
  • 44. Other Example: Data Analysis in Education Real World Sensor 1 Sensor k … … Non-Text Data Text Data Joint Mining of Non-Text and Text Predictive Model Multiple Predictors (Features) … Predicted Values of Real World Variables Change the World Teacher Student
  • 45. Big Data for Education Scalability Quality MOOC Small Classrooms “Big Data Technology” Scalable Intelligent MOOC Automate grading with machine learning Automate question answering on forums Towards Intelligent MOOC
  • 46. Submitted Assignments Graded Assignments Grade: 93 85 …. Traditional Manual Grading Proposed Automated Grading Submitted Assignments Clustering Batch grading Multi-dimensional Grade Predictor Grade Verification Graded Assignments Detailed Grading Results Improvement Performance & Behavior Analysis
  • 47. References • Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23. • Big Data Education System Leaderboard, Universy of Illinios at Urbana-Champaign, The Data and Information Systems Laboratories, https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2 ahUKEwiNrurljs7yAhUL8HMBHR89Ag8QFnoECAIQAQ&url=http%3A%2F%2Ftimes.cs.uiuc.edu%2F czhai%2Fpub%2Fbigdata-education-zhai.pptx&usg=AOvVaw30IHA6b1UxmFFK0SXCA5hr • Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987. • Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore. • Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer- Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.