SlideShare a Scribd company logo
1 of 47
Pemanfaatan Big Data dalam
Riset
Lala Septem Riza
Sekolah Pascasarjana
2023
Outlines
1. Pengenalan Data Science
2. Phenomena and Definition of Big Data
3. Platforms, Technology, Tool, dan Method in Big Data Analysis
4. Implementations and Research in Big Data
Introduction to Data Science
• Data science is the study that focuses on knowledge extraction from
data: data collection, preparation, analysis, visualization,
management, recommendation, etc.
• Data science is an interdisciplinary field that requires hacking skills
(i.e., programming), math and statistics knowledge, and substantive
expertise in a field of science.
Processes in Data Science
1. Objectives: asking the right questions
to find what the problem is.
2. Data Collection: Get Relevant Data for
Analysis of the Problem.
3. Data Preprocessing: Explore the Data
to Make Error Corrections (cleaning
and organizing).
4. Computational and Data model:
Descriptive, predictive, etc.
5. Reporting/Dissemination/Publication.
Data Science: Software and
Implementations|4
Final Goals in Data Analysis
1. Decision analytics: supports decision-making with visual analytics
that reflect reasoning.
2. Descriptive analytics: provides insight from historical data with
reporting, score cards, clustering, etc.
3. Predictive analytics: employs predictive modeling using statistical
and machine learning techniques.
4. Prescriptive analytics: recommends decisions using optimization,
simulation, etc.
Data Science: Software and
Implementations|5
Phenomena of Big Data
Volume of data digital 2010 to 2025 (in zettabytes 1021 bytes).
Internet Activities
Big Data
Petabytes: 1015 byte
The Shift of Marketplace
What is Big Data?
1.Volume: The huge amounts of data being
stored.
2.Velocity: The lightning speed at which data
streams must be processed and analyzed.
3.Variety: The different sources and
forms from which data is collected, such as
numbers, text, video, images, audio and text.
9Vs of Big Data Definitions
Historical/Traditional technologies don’t work
because …
Challenges in Big Data
Technology and Method in Big
Data Analysis
The Issues on Big Data Technologies:
1. Computational Models: How the data are
processed and analyzed  Data Analysis/Data
Science
2. Database/storage Frameworks: focuses on
technologies and mechanisms to write, read, and
manage Big Data efficiently. Furthermore,
handling fault tolerance, availability, consistency,
scalability, and heterogeneity of Big Data should
be considered as well
Big Data Platform
Big Data
Platform
Big Data Platforms
• Redundant and Reliable: Platforms can replicates data automatically,
so when machine goes down there is no data loss.
• Runs on commodity hardware: Don’t have to buy special hardware,
expensive RAIDs, or redundant hardware; reliability is built into
software.
• Scale-Out rather than Scale-UP.
• Bring code to data rather than data to code.
• Fault tolerant/Deal with failures.
• Break disk read barrier.
Introduction to Apache Hadoop
Hadoop History Timeline
• In April 2008, Hadoop broke a world record to become the
fastest system to sort an entire terabyte of data. Running on
a 910-node cluster, Hadoop sorted 1 terabyte in 209
seconds (just under 3.5 minutes), beating the previous year’s
winner of 297 seconds.
• In November of the same year, Google reported that its
MapReduce implementation sorted 1 terabyte in 68
seconds.
• Then, in April 2009, it was announced that a team at Yahoo!
had used Hadoop to sort 1 terabyte in 62 seconds.
• In the 2014 competition, a team from Databricks were joint
winners of the Gray Sort benchmark. They used a 207-node
Spark cluster to sort 100 terabytes of data in 1,406 seconds,
a rate of 4.27 terabytes per minute.
Hadoop Version
Hadoop Distributed File Systems (HDFS)
• HDFS is a filesystem designed for storing very large files
with streaming data access patterns, running on clusters
of commodity hardware.
• Very large files: hundreds of megabytes, gigabytes, or terabytes
in size.
• Streaming data access: a write once, read-many-times pattern.
• Commodity hardware: run on clusters of commodity hardware.
• HDFS is not a good fit:
• Low-latency data access
• Lots of small files
Research in Big Data
Implementations of Big Data Analysis
• Google: using Big Data for searching,
recommendation, etc.
• Amazon: Big Data resulted from collecting
customers’ behaviors for recommendation
system.
• Facebook: using Big Data Analysis for image
recognition when tagging, deepfakes, People
You May Know, dll.
Related paper to Big Data
1. Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic
repeats detection using Boyer-Moore algorithm on Apache Spark
Streaming. Telkomnika, 18(2), 783-791.
2. Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education:
a state of the art, limitations, and future research directions.
International Journal of Educational Technology in Higher Education,
17(1), 1-23.
3. Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M.
T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data
Analytical Application. In Proceedings of International Conference on
Data Science and Applications (pp. 759-771). Springer, Singapore.
Big Data in Bioinformatics
Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats
detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika,
18(2), 783-791.
Genomic repeats detection using Boyer-
Moore algorithm on Apache Spark Streaming
• Repetition identification and
classification are important
fundamental annotation tasks
because of the evolution of
genomes and diseases and
distinguish from other gene
types.
• A task of genomic repeats, which
basically is an analysis of string
matching or pattern matching, is
carried out to look for a pattern in
a large text.
Research Objective
• This research is aimed at building a big-data computational model
and implementing the Boyer Moore algorithm in finding string
patterns in human chromosome genome data contained in ensemble
pages.
• Apache Spark is an open-source cluster computing framework for
large data processing.
Research Method in
Genomic Repeats
• 4 working environments:
• In personal computers
• On virtual machines in google cloud
project
• On HDFS
• With apache spark streaming
• Data collection (round 3.9GB):
Human DNA sequences which can
be downloaded freely on page
ftp://ftp.Ensembl.Org/pub/release-
95/fasta/homo_sapiens/dna/.
Results: Speed
Comparisons
Big Data in Education
Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of
the art, limitations, and future research directions. International Journal of
Educational Technology in Higher Education, 17(1), 1-23.
Big data in education
• In the educational realm, a large volume of data is produced through
online courses, teaching and learning activities.
• Academic data can help teachers to analyze their teaching pedagogy
and affect changes according to students’ needs and requirement.
• The large-scale administrative data can play a tremendous role in
managing various educational problems.
• Therefore, it is essential for professionals to understand the
effectiveness of big data in education in order to minimize
educational issues
What research
themes have
been addressed
in educational
studies of big
data?
Roadmap
Big Data in
Education
Student Performance Monitor:
A Big Data Analytical Application
Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... &
Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application.
In Proceedings of International Conference on Data Science and Applications (pp.
759-771). Springer, Singapore.
Objectives
• To analyze Program Learning Outcome (PLO) in Outcome Based
Education (OBE) by using Big Data Analytics.
The outcome-based education (OBE) system
is an educational theory where every part of
the curriculum is centered around outcomes
or goals that a student must accomplish to
successfully complete their program.
Proposed Analysis
Results
Other Example: Data Analysis in Education
Real World Sensor 1
Sensor k
…
…
Non-Text
Data
Text
Data
Joint Mining
of Non-Text
and Text
Predictive
Model
Multiple
Predictors
(Features)
…
Predicted Values
of Real World Variables
Change the World Teacher
Student
Big Data for Education
Scalability
Quality
MOOC
Small Classrooms
“Big Data Technology”
Scalable Intelligent MOOC
Automate grading with machine learning
Automate question answering on forums
Towards
Intelligent
MOOC
Submitted Assignments Graded Assignments
Grade:
93
85
….
Traditional Manual Grading
Proposed Automated Grading
Submitted
Assignments
Clustering Batch
grading
Multi-dimensional Grade Predictor Grade
Verification
Graded
Assignments
Detailed
Grading Results
Improvement Performance &
Behavior Analysis
References
• Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art,
limitations, and future research directions. International Journal of Educational Technology in
Higher Education, 17(1), 1-23.
• Big Data Education System Leaderboard, Universy of Illinios at Urbana-Champaign, The Data and
Information Systems Laboratories,
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2
ahUKEwiNrurljs7yAhUL8HMBHR89Ag8QFnoECAIQAQ&url=http%3A%2F%2Ftimes.cs.uiuc.edu%2F
czhai%2Fpub%2Fbigdata-education-zhai.pptx&usg=AOvVaw30IHA6b1UxmFFK0SXCA5hr
• Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big
Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987.
• Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022).
Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International
Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
• Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-
Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.

More Related Content

Similar to Pemanfaatan Big Data Dalam Riset 2023.pptx

Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxYashiBatra1
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...vinayiqbusiness
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxsa3302
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 

Similar to Pemanfaatan Big Data Dalam Riset 2023.pptx (20)

Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Data management plans
Data management plansData management plans
Data management plans
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Recently uploaded (20)

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

Pemanfaatan Big Data Dalam Riset 2023.pptx

  • 1. Pemanfaatan Big Data dalam Riset Lala Septem Riza Sekolah Pascasarjana 2023
  • 2. Outlines 1. Pengenalan Data Science 2. Phenomena and Definition of Big Data 3. Platforms, Technology, Tool, dan Method in Big Data Analysis 4. Implementations and Research in Big Data
  • 3. Introduction to Data Science • Data science is the study that focuses on knowledge extraction from data: data collection, preparation, analysis, visualization, management, recommendation, etc. • Data science is an interdisciplinary field that requires hacking skills (i.e., programming), math and statistics knowledge, and substantive expertise in a field of science.
  • 4. Processes in Data Science 1. Objectives: asking the right questions to find what the problem is. 2. Data Collection: Get Relevant Data for Analysis of the Problem. 3. Data Preprocessing: Explore the Data to Make Error Corrections (cleaning and organizing). 4. Computational and Data model: Descriptive, predictive, etc. 5. Reporting/Dissemination/Publication. Data Science: Software and Implementations|4
  • 5. Final Goals in Data Analysis 1. Decision analytics: supports decision-making with visual analytics that reflect reasoning. 2. Descriptive analytics: provides insight from historical data with reporting, score cards, clustering, etc. 3. Predictive analytics: employs predictive modeling using statistical and machine learning techniques. 4. Prescriptive analytics: recommends decisions using optimization, simulation, etc. Data Science: Software and Implementations|5
  • 6. Phenomena of Big Data Volume of data digital 2010 to 2025 (in zettabytes 1021 bytes).
  • 7.
  • 10. The Shift of Marketplace
  • 11. What is Big Data? 1.Volume: The huge amounts of data being stored. 2.Velocity: The lightning speed at which data streams must be processed and analyzed. 3.Variety: The different sources and forms from which data is collected, such as numbers, text, video, images, audio and text.
  • 12. 9Vs of Big Data Definitions
  • 13.
  • 15.
  • 17. Technology and Method in Big Data Analysis
  • 18. The Issues on Big Data Technologies: 1. Computational Models: How the data are processed and analyzed  Data Analysis/Data Science 2. Database/storage Frameworks: focuses on technologies and mechanisms to write, read, and manage Big Data efficiently. Furthermore, handling fault tolerance, availability, consistency, scalability, and heterogeneity of Big Data should be considered as well
  • 21. Big Data Platforms • Redundant and Reliable: Platforms can replicates data automatically, so when machine goes down there is no data loss. • Runs on commodity hardware: Don’t have to buy special hardware, expensive RAIDs, or redundant hardware; reliability is built into software. • Scale-Out rather than Scale-UP. • Bring code to data rather than data to code. • Fault tolerant/Deal with failures. • Break disk read barrier.
  • 24. • In April 2008, Hadoop broke a world record to become the fastest system to sort an entire terabyte of data. Running on a 910-node cluster, Hadoop sorted 1 terabyte in 209 seconds (just under 3.5 minutes), beating the previous year’s winner of 297 seconds. • In November of the same year, Google reported that its MapReduce implementation sorted 1 terabyte in 68 seconds. • Then, in April 2009, it was announced that a team at Yahoo! had used Hadoop to sort 1 terabyte in 62 seconds. • In the 2014 competition, a team from Databricks were joint winners of the Gray Sort benchmark. They used a 207-node Spark cluster to sort 100 terabytes of data in 1,406 seconds, a rate of 4.27 terabytes per minute.
  • 25.
  • 27. Hadoop Distributed File Systems (HDFS) • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. • Very large files: hundreds of megabytes, gigabytes, or terabytes in size. • Streaming data access: a write once, read-many-times pattern. • Commodity hardware: run on clusters of commodity hardware. • HDFS is not a good fit: • Low-latency data access • Lots of small files
  • 29. Implementations of Big Data Analysis • Google: using Big Data for searching, recommendation, etc. • Amazon: Big Data resulted from collecting customers’ behaviors for recommendation system. • Facebook: using Big Data Analysis for image recognition when tagging, deepfakes, People You May Know, dll.
  • 30. Related paper to Big Data 1. Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791. 2. Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23. 3. Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
  • 31. Big Data in Bioinformatics Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.
  • 32. Genomic repeats detection using Boyer- Moore algorithm on Apache Spark Streaming • Repetition identification and classification are important fundamental annotation tasks because of the evolution of genomes and diseases and distinguish from other gene types. • A task of genomic repeats, which basically is an analysis of string matching or pattern matching, is carried out to look for a pattern in a large text.
  • 33. Research Objective • This research is aimed at building a big-data computational model and implementing the Boyer Moore algorithm in finding string patterns in human chromosome genome data contained in ensemble pages. • Apache Spark is an open-source cluster computing framework for large data processing.
  • 34. Research Method in Genomic Repeats • 4 working environments: • In personal computers • On virtual machines in google cloud project • On HDFS • With apache spark streaming • Data collection (round 3.9GB): Human DNA sequences which can be downloaded freely on page ftp://ftp.Ensembl.Org/pub/release- 95/fasta/homo_sapiens/dna/.
  • 36. Big Data in Education Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23.
  • 37. Big data in education • In the educational realm, a large volume of data is produced through online courses, teaching and learning activities. • Academic data can help teachers to analyze their teaching pedagogy and affect changes according to students’ needs and requirement. • The large-scale administrative data can play a tremendous role in managing various educational problems. • Therefore, it is essential for professionals to understand the effectiveness of big data in education in order to minimize educational issues
  • 38. What research themes have been addressed in educational studies of big data?
  • 40. Student Performance Monitor: A Big Data Analytical Application Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore.
  • 41. Objectives • To analyze Program Learning Outcome (PLO) in Outcome Based Education (OBE) by using Big Data Analytics. The outcome-based education (OBE) system is an educational theory where every part of the curriculum is centered around outcomes or goals that a student must accomplish to successfully complete their program.
  • 44. Other Example: Data Analysis in Education Real World Sensor 1 Sensor k … … Non-Text Data Text Data Joint Mining of Non-Text and Text Predictive Model Multiple Predictors (Features) … Predicted Values of Real World Variables Change the World Teacher Student
  • 45. Big Data for Education Scalability Quality MOOC Small Classrooms “Big Data Technology” Scalable Intelligent MOOC Automate grading with machine learning Automate question answering on forums Towards Intelligent MOOC
  • 46. Submitted Assignments Graded Assignments Grade: 93 85 …. Traditional Manual Grading Proposed Automated Grading Submitted Assignments Clustering Batch grading Multi-dimensional Grade Predictor Grade Verification Graded Assignments Detailed Grading Results Improvement Performance & Behavior Analysis
  • 47. References • Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 1-23. • Big Data Education System Leaderboard, Universy of Illinios at Urbana-Champaign, The Data and Information Systems Laboratories, https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2 ahUKEwiNrurljs7yAhUL8HMBHR89Ag8QFnoECAIQAQ&url=http%3A%2F%2Ftimes.cs.uiuc.edu%2F czhai%2Fpub%2Fbigdata-education-zhai.pptx&usg=AOvVaw30IHA6b1UxmFFK0SXCA5hr • Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987. • Mayabee, T. T., Khan, S., Alam, A., Amin, S., Chowdhury, J. K., Hassan, M. T., ... & Hasan, M. (2022). Student Performance Monitor: A Big Data Analytical Application. In Proceedings of International Conference on Data Science and Applications (pp. 759-771). Springer, Singapore. • Riza, L. S., Pratama, F. D., Piantari, E., & Fashi, M. (2020). Genomic repeats detection using Boyer- Moore algorithm on Apache Spark Streaming. Telkomnika, 18(2), 783-791.