SlideShare a Scribd company logo
1 of 22
Handling
Big Data
Deddy Setyadi
www.elakiri.com
... - 2003
2 days in 2011
10 minutes in 2013
5 billion GB
Live stats
2016
Where those data comes from?
Activity Listening music, reading a book, searching, shopping, etc.
Our conversations in social media are now digitally recorded.Conversation
We upload and share 100s of thousands of them on social media
sites every second.
Photo and Video
We are increasingly surrounded by sensors that collect and share
data.
Sensor
We now have smart TVs that are able to collect and process data.The Internet of Things
The basic idea behind the phrase
'Big Data' is that everything we
do is increasingly leaving a digital
trace (or data), which we (and
others) can use and analyse
Big data :
means really a big data, it is
a collection of large
datasets that cannot be
processed using traditional
computing techniques.
Big Data includes huge volume, high velocity,
and extensible variety of data.
Structured
Item 2
Semi Structured Unstructured
● Database
● Census records
● Economic data
● Phone numbers
● JSON
● XML
● Word
● PDF
● Text
● Media Logs
Benefits of Big Data
https://www.youtube.com/watch?v=HqsBensINkE
Big Data Technologies
Operational Big Data
This include systems like MongoDB that
provide operational capabilities for real-
time, interactive workloads where data is
primarily captured and stored.
NoSQL Big Data systems are designed to
allow massive computations to be run
inexpensively and efficiently. This makes
operational big data workloads much
easier to manage, cheaper, and faster to
implement.
Analytical Big Data
This includes systems like Massively
Parallel Processing (MPP) database
systems and MapReduce that provide
analytical capabilities for retrospective
and complex analysis.
A system based on MapReduce can be
scaled up from single servers to
thousands of high and low end machines.
Big Data Solutions
Traditional Approach
In this approach, an enterprise will have a
computer to store and process big data. Here
data will be stored in an RDBMS, process the
required data and present it to the users for
analysis purpose. tutorialspoint.com
Google’s
Solution
Google solved this problem using an
algorithm called MapReduce. This
algorithm divides the task into small
parts and assigns those parts to
many computers connected over
the network, and collects the results
to form the final result dataset.
tutorialspoint.com
Hadoop
Hadoop runs applications using the
MapReduce algorithm, where the
data is processed in parallel on
different CPU nodes. In short,
Hadoop framework is capable
enough to develop applications,
capable of running on clusters of
computers and they could perform
complete statistical analysis for a
huge amounts of data.
tutorialspoint.com
Hadoop
Hadoop
Architecture
tutorialspoint.com
MapReduce
Data
Map
Converts data into another set of
data. Elements are broken down
into tuples (key/value pairs).
Reduce
Shuffle stage and the Reduce
stage that produces a new set
of output, which will be stored
in the HDFS.
1 2 3
MapReduce
http://mm-tom.s3.amazonaws.com/blog/MapReduce.png
MapReduce
noviardisyamsuir.blogspot.com
HDFS
● Fault detection and recovery : HDFS
should have mechanisms for quick and
automatic fault detection and recovery.
● Huge datasets : HDFS should have
hundreds of nodes per cluster to manage
the applications having huge data sets.
● Hardware at data : A requested task can
be done efficiently.
tutorialspoint.com
Demo
Closing ...
blog.cloudera.com
References & Source
http://www.tutorialspoint.com/hadoop/
http://www.wired.com/2013/02/the-decades-that-invented-the-future-part-11-2001-2010/
http://www.slideshare.net/BernardMarr/140228-big-data-slide-share/3-The_basic_idea_behind_the
https://www.youtube.com/watch?v=HqsBensINkE
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
http://noviardisyamsuir.blogspot.co.id/2016/03/hadoop-mapreduce-adalah.html
http://www.slideshare.net/lynnlangit/hadoop-mapreduce-fundamentals-21427224/5-
What_types_of_business_problems
https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-
cluster/
Thank you!

More Related Content

What's hot

Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applicationsali easazadeh
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data TimelineBig Cloud
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data typesPro Guide
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data ScienceWim Van Leuven
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 

What's hot (20)

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big data
Big dataBig data
Big data
 
Big data management
Big data managementBig data management
Big data management
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
Big Data
Big DataBig Data
Big Data
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big data
Big dataBig data
Big data
 

Viewers also liked

To Have or Have Not.PDF
To Have or Have Not.PDFTo Have or Have Not.PDF
To Have or Have Not.PDFLisa M. Beck
 
Marketing Automation Vendors_Investment_Report
Marketing Automation Vendors_Investment_ReportMarketing Automation Vendors_Investment_Report
Marketing Automation Vendors_Investment_ReportTodd Price
 
Sentenza_Trib_Alessandria
Sentenza_Trib_AlessandriaSentenza_Trib_Alessandria
Sentenza_Trib_AlessandriaOlga Veremeenko
 
Laura Pittoni Showcase 2017
Laura Pittoni Showcase 2017Laura Pittoni Showcase 2017
Laura Pittoni Showcase 2017Laura Pittoni
 
ความสำคัญของคอมพิวเตอร์ 2222
ความสำคัญของคอมพิวเตอร์ 2222ความสำคัญของคอมพิวเตอร์ 2222
ความสำคัญของคอมพิวเตอร์ 2222aomhongoingkad
 
Narrative theory
Narrative theoryNarrative theory
Narrative theory09evansash
 
Company Profile PT Adi Caraka Tirta Containerline
Company Profile PT Adi Caraka Tirta ContainerlineCompany Profile PT Adi Caraka Tirta Containerline
Company Profile PT Adi Caraka Tirta Containerlineari f rahman
 
Be The Machine Winter Resorts
Be The Machine Winter ResortsBe The Machine Winter Resorts
Be The Machine Winter Resortsthemachinenyc
 
Automatic Voltage Range (AVR) - Sollatek
Automatic Voltage Range (AVR) - SollatekAutomatic Voltage Range (AVR) - Sollatek
Automatic Voltage Range (AVR) - SollatekSollatek
 

Viewers also liked (14)

To Have or Have Not.PDF
To Have or Have Not.PDFTo Have or Have Not.PDF
To Have or Have Not.PDF
 
Php curl
Php curlPhp curl
Php curl
 
what is today?
what is today?what is today?
what is today?
 
Mohamed Samy CV With Experience Certificates
Mohamed Samy CV With Experience CertificatesMohamed Samy CV With Experience Certificates
Mohamed Samy CV With Experience Certificates
 
Marketing Automation Vendors_Investment_Report
Marketing Automation Vendors_Investment_ReportMarketing Automation Vendors_Investment_Report
Marketing Automation Vendors_Investment_Report
 
Sentenza_Trib_Alessandria
Sentenza_Trib_AlessandriaSentenza_Trib_Alessandria
Sentenza_Trib_Alessandria
 
Laura Pittoni Showcase 2017
Laura Pittoni Showcase 2017Laura Pittoni Showcase 2017
Laura Pittoni Showcase 2017
 
ความสำคัญของคอมพิวเตอร์ 2222
ความสำคัญของคอมพิวเตอร์ 2222ความสำคัญของคอมพิวเตอร์ 2222
ความสำคัญของคอมพิวเตอร์ 2222
 
Narrative theory
Narrative theoryNarrative theory
Narrative theory
 
Objetivo 6- estrategia 3
Objetivo 6- estrategia 3Objetivo 6- estrategia 3
Objetivo 6- estrategia 3
 
Company Profile PT Adi Caraka Tirta Containerline
Company Profile PT Adi Caraka Tirta ContainerlineCompany Profile PT Adi Caraka Tirta Containerline
Company Profile PT Adi Caraka Tirta Containerline
 
Udflugter på Fano
Udflugter på FanoUdflugter på Fano
Udflugter på Fano
 
Be The Machine Winter Resorts
Be The Machine Winter ResortsBe The Machine Winter Resorts
Be The Machine Winter Resorts
 
Automatic Voltage Range (AVR) - Sollatek
Automatic Voltage Range (AVR) - SollatekAutomatic Voltage Range (AVR) - Sollatek
Automatic Voltage Range (AVR) - Sollatek
 

Similar to Big data (20)

Big data
Big dataBig data
Big data
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big Data
Big DataBig Data
Big Data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
GADLJRIET850691
GADLJRIET850691GADLJRIET850691
GADLJRIET850691
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 

Recently uploaded

Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1sinhaabhiyanshu
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 

Big data

Editor's Notes

  1. Perkembangan teknologi, alat, dan media komunikasi yang semakin pesat, berbanding lurus dengan jumlah data yang dihasilkan oleh umat manusia. Dari awal terbentuknya bumi sampai 2003, ketika bilik-bilik warnet masih sepi, dan internet masih benda asing, data yang dihasilkan umat manusia itu sebanyak 5 milliar GB. Kemudian di tahun-tahun berikutnya, muncul friendster, facebook, twitter, pun perangkat baru mulai bermunculan seperti ipod, nokia yang dibekali dengan gprs sehingga umat manusia mulai menggunakan internet. Delapan tahun berlalu, blackberry mulai booming, disertai dengan whatsapp, twitter, dan dalam 2 hari mampu memproduksi 5 milyar GB meskipun untuk paketan internet saat itu masih eman-eman. Android pun mulai menjamur beberapa tahun sesudahnya, pengguna pun mulai banyak, umat manusia sudah mulai terbiasa dengan paketan internet dan akhirnya data sebanyak 5 milyar GB dapat diproduksi dalam eaktu 10 menit.
  2. Simple activities like listening to music or reading a book are now generating data. Digital music players and eBooks collect data on our activities. Your smart phone collects data on how you use it and your web browser collects information on what you are searching for. Your credit card company collects data on where you shop and your shop collects data on what you buy. It is hard to imagine any activity that does not generate data. Our conversations are now digitally recorded. It all started with emails but nowadays most of our conversations leave a digital trail. Just think of all the conversations we have on social media sites like Facebook or Twitter. Even many of our phone conversations are now digitally recorded. Just think about all the pictures we take on our smart phones or digital cameras. We upload and share 100s of thousands of them on social media sites every second. The increasing amounts of CCTV cameras take video images and we up-load hundreds of hours of video images to YouTube and other sites every minute . We are increasingly surrounded by sensors that collect and share data. Take your smart phone, it contains a global positioning sensor to track exactly where you are every second of the day, it includes an accelometer to track the speed and direction at which you are travelling. We now have sensors in many devices and products. We now have smart TVs that are able to collect and process data, we have smart watches, smart fridges, and smart alarms. The Internet of Things, or Internet of Everything connects these devices so that e.g. the traffic sensors on the road send data to your alarm clock which will wake you up earlier than planned because the blocked road means you have to leave earlier to make your 9am meeting…
  3. Volume refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. New big data tools use distributed systems so that we can store and analyse data across databases that are dotted around anywhere in the world. Velocity refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyse the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases. Variety refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such as messages, social media conversations, photos, sensor data, video or voice recordings.
  4. Limitation : This approach works well where we have less volume of data that can be accommodated by standard database servers, or up to the limit of the processor which is processing the data. But when it comes to dealing with huge amounts of data, it is really a tedious task to process such data through a traditional database server.