SlideShare a Scribd company logo
Sqoop
Import with Append mode
and Last Modified mode
Getting data from Mysql Database: import
$sqoop import --connect jdbc:mysql://localhost/db_1
--username root --password root --table student_details --split-
by ID --target-dir studentdata;
$hadoop fs -ls studentdata/
Now we can see multiple part-m files in the folder. This is
because sqoop uses multiple map tasks to process the output
and each mapper gives a subset of rows and by default
sqoop uses 4 mappers i.e. the output is divided into number
of mappers.
Use CAT command to view the content of the each mapper
outputs: part-m-00000 inside u will see row data such as abc 12 TX
part-m-00001 inside –no data---
part-m-00002 inside u will see row data such as ecg 56 FL
part-m-00003 inside --- no data---
Rupak Roy
 The reason behind this is we are not using Primary key
for splitting and results in unbalance tasks where some
mapper process more data than other.
Sqoop by default uses Primary key of the table for
splitting the column.
Alternatively, we can address this issue by explicitly
declaring which column will be used for splitting the rows
among the mappers.
#explicitly declaring
Sqoop import --connect jdbc:mysql://localhost/db_1 --
username root --password root --table student_details --
Target-dir studentdata --split-by ID;
Rupak Roy
Now using Primary key
#adding primary key to our table in the database
Mysql> ALTER TABLE student_details ;
ADD PRIMARY KEY (ID);
#now use the same query to load the same data from mysql database to the hdfs.
$sqoop import --connect
jdbc:mysql://localhost/db_1 --u root –p root - - table student_details –target-dir
student_details1.
Note: it will through an error if our target directory already have the same student_details folder.
#check the data
Hadoop fs –ls /user/hduser/student_details1
Hadoop fs –cat /user/hduser/student_details1/part-m-00000
#access the database and table list
$ sqoop list-databases --connect jdbc:/mysql://localhost/ --username root --
password root
$ sqoop list-tables –connect jdbc:/mysql://localhost/db_1 --username root --
password root
Rupak Roy
Controlling Parallelism
We know sqoop by default uses map tasks to
process its job.
However scoop also provides flexibility to
change the number of map tasks depending on
our job requirements.
 With the flexibility of controlling the amount of
map tasks i.e. the parallel processing. helps to
control the load in our database.
 More mappers doesn’t always mean faster
performance. The optimal number depends
on the type of database, hardware of the
nodes(systems) or the amount of job requests.
Rupak Roy
Controlling Parallelism
Sqoop import --connect jdbc:///localhost/db_1 --
usename root --password root --table student_details
--num – mappers 1 --target-dir student_details2;
Alternatively,
Sqoop import --connect jdbc://localhost/db_1 --
username root --password root --table emp --m 1 --
target-dir emp;
Rupak Roy
Now if we want to import only the
updated data
 This can be done by using 2 modes:
1) Append Mode
2) Last – Modified Mode
1)Append Mode:
First add new rows
Mysql> use tb_1;
> insert into student_details(ID,Name,Location) values (44,’albert’,’CA’)
> insert into student_details(ID,Name,Location) values (55,’Zayn’,’MI’)
#note: we need integer data type to detect the last value in append mode
ALTER TABLE student_details
MODIFY ID int(30);
Then import
$sqoop import --connect jdbc:mysql://locahost/db_1 --username root --password root --
table student_details –split-by ID --incremental append --check-column ID --last-value 33
–target-dir Appendresults/
Where as, type: incremental mode: append, --check-column: checks the column
Therefore Append Mode is used only when the table is populated with new rows.
Rupak Roy
Now if we want to import only the
updated data
2) Last- Modified Mode: is used to overcome the
limitations of Append Mode’s row and column
updates. Hence is it suitable when the table is
populated with new rows and columns.
Each time whenever the table gets updated, the
last modified mode will use the recent time-stamp
attached with each updates to import only the
new modified data rows and columns in hdfs.
Rupak Roy
Add
#add a timestamp column
Mysql> ALTER TABLE student_details
ADD Column updated_at
TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP;
#add a new column
Mysql> ALTER TABLE student_details
ADD Column YEAR char(10)
AFTER Location;
#addValues to the new columns
MySql> insert into student_details(YEAR)
values(2010)
OR
Mysql> UPDATE student_details
SET Year = 2010
where Location = ‘FL’;
….repeat again for the 2 rows
Rupak Roy
#then import
$sqoop import --connect jdbc:mysql://localhost/db_1 -u root –p
root --table student_details --split-by ID --incremental lastmodified
--checkcolumn updated_at --last-Value “2017-01-15 13:00:28”
--target-dir lmresults/
Where as type: incremental mode: lastmodified,
--checkcolumn updated_at: will check the
timestamp column with --last-Value “2017-01-15
13:00:28”
Rupak Roy
Append Mode vs Last Modified
mode
 Both append and last modified mode sets apart with
their unique advantages over each others limitations.
In Append mode you don’t have to delete the existing
output folder in HDFS, it will create an another file and
renames by itself sequentially.
But in Last Modified Mode sqoop needs the existing
output HDFS folder to be empty .
Also
In append mode it will import the data from the last-
value described but in Last Modified mode it will take all
the newly modified rows & columns into account.
Rupak Roy
Next
 In real life it might not be efficient and
practical to remember the last value each
time we run sqoop. To overcome this issue we
have an another function call sqoop job.
Rupak Roy

More Related Content

What's hot

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
Rupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
Rupak Roy
 
Unit 4 lecture-3
Unit 4 lecture-3Unit 4 lecture-3
Unit 4 lecture-3
vishal choudhary
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
AnandMHadoop
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
AnandMHadoop
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
vishal choudhary
 
Hive and data analysis using pandas
 Hive  and  data analysis  using pandas Hive  and  data analysis  using pandas
Hive and data analysis using pandas
Purna Chander K
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Yue Chen
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
ragho
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
vishal choudhary
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revised
Barry DeCicco
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
Yogesh Kulkarni
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
Prashant Gupta
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 

What's hot (19)

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Unit 4 lecture-3
Unit 4 lecture-3Unit 4 lecture-3
Unit 4 lecture-3
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Unit 2
Unit 2Unit 2
Unit 2
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
Hive and data analysis using pandas
 Hive  and  data analysis  using pandas Hive  and  data analysis  using pandas
Hive and data analysis using pandas
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revised
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 

Similar to Apache Scoop - Import with Append mode and Last Modified mode

Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
Rupak Roy
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
Sqoop
SqoopSqoop
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloud
Tahsin Hasan
 
My sql with querys
My sql with querysMy sql with querys
My sql with querysNIRMAL FELIX
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
Guy Harrison
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
Laravel intake 37 all days
Laravel intake 37 all daysLaravel intake 37 all days
Laravel intake 37 all days
Ahmed Abd El Ftah
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
BalajiVaradarajan13
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063Madhusudan Anand
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracle
Jyotirmoy Pramanik
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 

Similar to Apache Scoop - Import with Append mode and Last Modified mode (20)

Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Sqoop
SqoopSqoop
Sqoop
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloud
 
My sql.ppt
My sql.pptMy sql.ppt
My sql.ppt
 
My sql with querys
My sql with querysMy sql with querys
My sql with querys
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
AWS Pentest.pdf
AWS Pentest.pdfAWS Pentest.pdf
AWS Pentest.pdf
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Hive
HiveHive
Hive
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Laravel intake 37 all days
Laravel intake 37 all daysLaravel intake 37 all days
Laravel intake 37 all days
 
Raj mysql
Raj mysqlRaj mysql
Raj mysql
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracle
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
Rupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
Rupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
Rupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
Rupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
Rupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
Rupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
Rupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
Rupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
Rupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
Rupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
Rupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
Rupak Roy
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)
Rupak Roy
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
Rupak Roy
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
Rupak Roy
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R
Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R
 

Recently uploaded

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 

Recently uploaded (20)

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 

Apache Scoop - Import with Append mode and Last Modified mode

  • 1. Sqoop Import with Append mode and Last Modified mode
  • 2. Getting data from Mysql Database: import $sqoop import --connect jdbc:mysql://localhost/db_1 --username root --password root --table student_details --split- by ID --target-dir studentdata; $hadoop fs -ls studentdata/ Now we can see multiple part-m files in the folder. This is because sqoop uses multiple map tasks to process the output and each mapper gives a subset of rows and by default sqoop uses 4 mappers i.e. the output is divided into number of mappers. Use CAT command to view the content of the each mapper outputs: part-m-00000 inside u will see row data such as abc 12 TX part-m-00001 inside –no data--- part-m-00002 inside u will see row data such as ecg 56 FL part-m-00003 inside --- no data--- Rupak Roy
  • 3.  The reason behind this is we are not using Primary key for splitting and results in unbalance tasks where some mapper process more data than other. Sqoop by default uses Primary key of the table for splitting the column. Alternatively, we can address this issue by explicitly declaring which column will be used for splitting the rows among the mappers. #explicitly declaring Sqoop import --connect jdbc:mysql://localhost/db_1 -- username root --password root --table student_details -- Target-dir studentdata --split-by ID; Rupak Roy
  • 4. Now using Primary key #adding primary key to our table in the database Mysql> ALTER TABLE student_details ; ADD PRIMARY KEY (ID); #now use the same query to load the same data from mysql database to the hdfs. $sqoop import --connect jdbc:mysql://localhost/db_1 --u root –p root - - table student_details –target-dir student_details1. Note: it will through an error if our target directory already have the same student_details folder. #check the data Hadoop fs –ls /user/hduser/student_details1 Hadoop fs –cat /user/hduser/student_details1/part-m-00000 #access the database and table list $ sqoop list-databases --connect jdbc:/mysql://localhost/ --username root -- password root $ sqoop list-tables –connect jdbc:/mysql://localhost/db_1 --username root -- password root Rupak Roy
  • 5. Controlling Parallelism We know sqoop by default uses map tasks to process its job. However scoop also provides flexibility to change the number of map tasks depending on our job requirements.  With the flexibility of controlling the amount of map tasks i.e. the parallel processing. helps to control the load in our database.  More mappers doesn’t always mean faster performance. The optimal number depends on the type of database, hardware of the nodes(systems) or the amount of job requests. Rupak Roy
  • 6. Controlling Parallelism Sqoop import --connect jdbc:///localhost/db_1 -- usename root --password root --table student_details --num – mappers 1 --target-dir student_details2; Alternatively, Sqoop import --connect jdbc://localhost/db_1 -- username root --password root --table emp --m 1 -- target-dir emp; Rupak Roy
  • 7. Now if we want to import only the updated data  This can be done by using 2 modes: 1) Append Mode 2) Last – Modified Mode 1)Append Mode: First add new rows Mysql> use tb_1; > insert into student_details(ID,Name,Location) values (44,’albert’,’CA’) > insert into student_details(ID,Name,Location) values (55,’Zayn’,’MI’) #note: we need integer data type to detect the last value in append mode ALTER TABLE student_details MODIFY ID int(30); Then import $sqoop import --connect jdbc:mysql://locahost/db_1 --username root --password root -- table student_details –split-by ID --incremental append --check-column ID --last-value 33 –target-dir Appendresults/ Where as, type: incremental mode: append, --check-column: checks the column Therefore Append Mode is used only when the table is populated with new rows. Rupak Roy
  • 8. Now if we want to import only the updated data 2) Last- Modified Mode: is used to overcome the limitations of Append Mode’s row and column updates. Hence is it suitable when the table is populated with new rows and columns. Each time whenever the table gets updated, the last modified mode will use the recent time-stamp attached with each updates to import only the new modified data rows and columns in hdfs. Rupak Roy
  • 9. Add #add a timestamp column Mysql> ALTER TABLE student_details ADD Column updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP; #add a new column Mysql> ALTER TABLE student_details ADD Column YEAR char(10) AFTER Location; #addValues to the new columns MySql> insert into student_details(YEAR) values(2010) OR Mysql> UPDATE student_details SET Year = 2010 where Location = ‘FL’; ….repeat again for the 2 rows Rupak Roy
  • 10. #then import $sqoop import --connect jdbc:mysql://localhost/db_1 -u root –p root --table student_details --split-by ID --incremental lastmodified --checkcolumn updated_at --last-Value “2017-01-15 13:00:28” --target-dir lmresults/ Where as type: incremental mode: lastmodified, --checkcolumn updated_at: will check the timestamp column with --last-Value “2017-01-15 13:00:28” Rupak Roy
  • 11. Append Mode vs Last Modified mode  Both append and last modified mode sets apart with their unique advantages over each others limitations. In Append mode you don’t have to delete the existing output folder in HDFS, it will create an another file and renames by itself sequentially. But in Last Modified Mode sqoop needs the existing output HDFS folder to be empty . Also In append mode it will import the data from the last- value described but in Last Modified mode it will take all the newly modified rows & columns into account. Rupak Roy
  • 12. Next  In real life it might not be efficient and practical to remember the last value each time we run sqoop. To overcome this issue we have an another function call sqoop job. Rupak Roy