www.enkitec.com++ 1+++
Moving'Data'Between'Oracle'Exadata'and'Hadoop.'
Fast.+
Tanel'Põder'
Enkitec'
+
h.p://www.enkitec.co...
www.enkitec.com++ 2+++
Intro:+About+me+
•  Tanel+Põder+
•  Former+Oracle+Database+Performance+geek+
•  Present+Exadata+Per...
www.enkitec.com++ 3+++
About+Enkitec+
•  Enkitec+
•  North+America+
•  EMEA+
+
•  100++staff+
•  In+US,+Europe+
•  Consulta...
www.enkitec.com++ 4+++
Our+exaPlab+environment+
•  Exadata+V2+(quarter+rack)+
•  Exadata+X2P2+(quarter+rack)+
•  Exadata'X...
www.enkitec.com++ 5+++
Disclaimers++
•  The+numbers+shown+here+are+not+from+"real"+benchmarks+
•  The+actual+data+loading...
www.enkitec.com++ 6+++
(Too)+Many+Data+Loading+OpOons+
•  Pull+Hadoop+data+into+Oracle+
•  Oracle'SQL'Connector'for'HDFS'
...
www.enkitec.com++ 7+++
Oracle+SQL+Connector+for+HDFS+
CREATE TABLE "TANEL"."TERASORT_1T_100"	
(	"TOKEN_TYPE" VARCHAR2(4000...
www.enkitec.com++ 8+++
OSCH+data+locaOon+files+
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>	
<locationFile>	
<h...
www.enkitec.com++ 9+++
TesOng+Oracle+SQL+Connector+for+HDFS+
•  CREATE+TABLE+target+AS++
SELECT+/*++PARALLEL+*/+*+FROM++te...
www.enkitec.com++ 10+++
Where+is+your+bo.leneck?+
Hadoop+Cluster+
HDFS+
MR+
job+
MR+
job+
MapReduce+
(+CPU+)+
Oracle+Datab...
www.enkitec.com++ 11+++
TesOng+Oracle+SQL+Connector+for+HDFS+
www.enkitec.com++ 12+++
Unbalanced+Parallel+Slave+acOvity?+
www.enkitec.com++ 13+++
Increase+Max+Allowed+External+Table+Parallelism+
CREATE TABLE terasort_1t_100 (	
...	
ORGANIZATION...
www.enkitec.com++ 14+++
More+"finePgrained"+OSCH+data+locaOon+files+
$ cat osch-tanel-00099 	
	
<?xml version="1.0" encoding...
www.enkitec.com++ 15+++
BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+
500P600+MB/s+load+by+single+
DB+node+(1P2+TB...
www.enkitec.com++ 16+++
BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+
Skewed/Unbalanced+parallel+
execuOon:+4+slav...
www.enkitec.com++ 17+++
Hadoop+Cluster+CPUs+are+idle?!+
www.enkitec.com++ 18+++
Drilling+deeper+into+the+CPU+usage+
SQL> @ostackprof 788 0.1 100	
	
Below is the stack prefix comm...
www.enkitec.com++ 19+++
Datatype'Conversion'is'CPU'hungry!!!'
You+can+offload+the+
"preprocessing+and+datatype+
conversion"+...
www.enkitec.com++ 20+++
Oracle+Loader+for+Hadoop+
Hadoop+Cluster+
HDFS+
MR+
job+
MR+
job+
MapReduce+
(+CPU+)+
Oracle+Datab...
www.enkitec.com++ 21+++
•  Source:(High(Performance(Connectors(for(Load(and(Access(of(Data(from(
Hadoop(to(Oracle(Database...
www.enkitec.com++ 22+++
Oracle+Loader+for+Hadoop+
•  Can+preprocess+and+convert+datatypes+to+Oracle+"naOve"+
format+using+...
www.enkitec.com++ 23+++
References+
OTN+Big+Data+Connectors+page+
•  h.p://www.oracle.com/technetwork/bdc/bigPdataPconnect...
www.enkitec.com++ 24+++
Thanks!!!+
•  QuesOons?+
•  Ask+now+:)+
•  Or+Contact+
•  tanel@tanelpoder.com+
•  h.p://blog.tane...
Upcoming SlideShare
Loading in...5
×

Moving Data Between Exadata and Hadoop

704

Published on

Published in: Technology, News & Politics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
704
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Moving Data Between Exadata and Hadoop

  1. 1. www.enkitec.com++ 1+++ Moving'Data'Between'Oracle'Exadata'and'Hadoop.' Fast.+ Tanel'Põder' Enkitec' + h.p://www.enkitec.com+ h.p://blog.tanelpoder.com+
  2. 2. www.enkitec.com++ 2+++ Intro:+About+me+ •  Tanel+Põder+ •  Former+Oracle+Database+Performance+geek+ •  Present+Exadata+Performance+geek+ •  Future+Hadoop+Perfomance+geek+ •  My+Exadata+experience+ •  2009+...+2013+ •  Exadata+V1+…+X3+ •  MulOPrack+Exadatas+ •  MixedPrack+Exadatas+ •  My+Hadoop+Experience+ •  Ask+again+next+year+;P)+ + Expert'Oracle'Exadata' book+ (with+Kerry+Osborne+and+ Randy+Johnson+of+Enkitec)+
  3. 3. www.enkitec.com++ 3+++ About+Enkitec+ •  Enkitec+ •  North+America+ •  EMEA+ + •  100++staff+ •  In+US,+Europe+ •  Consultants+with++ Oracle+experience++ of+15++years+on+average+ •  What+makes+us+so+awesome+ •  200+'Exadata'implementaBons'to'date' + •  Enkitec+ExaPLab++ •  We+have+3+Exadatas+(V2,+X2P2,+X3P2)+ •  FullPRack+Big+Data+Appliance+ •  ExalyOcs+ •  ODA+ Everything'Exa' ' Planning/PoC+ ImplementaOon+ ConsolidaOon+ MigraOon+ Backup/Recovery+ Patching+ TroubleshooOng+ Performance+ Capacity+ Training+
  4. 4. www.enkitec.com++ 4+++ Our+exaPlab+environment+ •  Exadata+V2+(quarter+rack)+ •  Exadata+X2P2+(quarter+rack)+ •  Exadata'X3G2'(quarter'rack)' •  Big'Data'Appliance'(full'rack)' •  ExalyOcs,+ODA,+etc+ IB+
  5. 5. www.enkitec.com++ 5+++ Disclaimers++ •  The+numbers+shown+here+are+not+from+"real"+benchmarks+ •  The+actual+data+loading+speeds+vary+greatly+when+using+real+data+ •  (column+count,+datatypes+etc+etc)+ •  This+is+not+a+"how+to+configure+hadoop+tools"+session+ •  ...it's+all+about+performance+
  6. 6. www.enkitec.com++ 6+++ (Too)+Many+Data+Loading+OpOons+ •  Pull+Hadoop+data+into+Oracle+ •  Oracle'SQL'Connector'for'HDFS' •  Oracle+Heterogenous+Services+++Hive/Impala+ODBC+ •  FusePmounted+HDFS+++external+table+load+ •  Push+Hadoop+data+into+Oracle+ •  Sqoop+ •  Oracle+Loader+for+Hadoop+ •  Pull+Oracle+data+into+Hadoop+ •  Sqoop+ •  Tom+Kyte's+flat+unloader+(to+Hadoop+local+filesystem+++copy+to+HDFS)+
  7. 7. www.enkitec.com++ 7+++ Oracle+SQL+Connector+for+HDFS+ CREATE TABLE "TANEL"."TERASORT_1T_100" ( "TOKEN_TYPE" VARCHAR2(4000), "DATE_MONTH" VARCHAR2(4000), "TOKEN_COUNT" VARCHAR2(4000), "TOKEN_VALUE" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'3058273927' ( "TOKEN_TYPE" CHAR(4000), "DATE_MONTH" CHAR(4000), "TOKEN_COUNT" CHAR(4000), "TOKEN_VALUE" CHAR(4000) ) ) LOCATION ( 'osch-tanel-00000', 'osch-tanel-00001', 'osch-tanel-00002', 'osch-tanel-00003' ) ) ... Visible+to+Oracle+as+an+ External+Table.+ Parallelizable.+Insert+select,+ CTAS+ The+PREPROCESSOR+ program+hdfs_stream+is+a+ java+program+capable+of+ reading/streaming+files+from+ HDFS+ The+Oracle+SQL+Connector+ Data+"locaOon+pointer"+files+ to+1'TB+of+data+
  8. 8. www.enkitec.com++ 8+++ OSCH+data+locaOon+files+ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <locationFile> <header> <version>1.0</version> <fileName>osch-20130708020324-4644-1</fileName> <createDate>2013-07-08T14:03:24</createDate> <publishDate>2013-07-08T02:03:24</publishDate> <productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName> <productVersion>2.1.0</productVersion> </header> <uri_list> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00000 </uri_list_item> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00006 </uri_list_item> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00008 </uri_list_item> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00014 </uri_list_item> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00016 </uri_list_item> ... Each+"locaOon+pointer"+file+ the+external+table+loader+ uses+points+to+one+or+more+ actual+HDFS+files+ + (this+config+file+is+edited+for+ fomaong+purposes)+
  9. 9. www.enkitec.com++ 9+++ TesOng+Oracle+SQL+Connector+for+HDFS+ •  CREATE+TABLE+target+AS++ SELECT+/*++PARALLEL+*/+*+FROM++terasort_1t;+ Only+75+MB+per+ second?+
  10. 10. www.enkitec.com++ 10+++ Where+is+your+bo.leneck?+ Hadoop+Cluster+ HDFS+ MR+ job+ MR+ job+ MapReduce+ (+CPU+)+ Oracle+Database+ Storage+ MR+ job+ MR+ job+ PX+Slaves+ (+CPU+)+ I/O+ O/I+ Network+ + + "ComputaOon"+ Decompression+ Text+file+parsing+ Datatype+conversion+ Text+file+parsing?+ Datatype+conversion?+ HCC+compression?+ DB+Waits+ ContenBon?' + Network+bandwidth+/+ throughput+/+ configuraOon++ The'only'way'to' know'is'to'measure!'
  11. 11. www.enkitec.com++ 11+++ TesOng+Oracle+SQL+Connector+for+HDFS+
  12. 12. www.enkitec.com++ 12+++ Unbalanced+Parallel+Slave+acOvity?+
  13. 13. www.enkitec.com++ 13+++ Increase+Max+Allowed+External+Table+Parallelism+ CREATE TABLE terasort_1t_100 ( ... ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR" ... PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' ... LOCATION ( 'osch-tanel-00000' , 'osch-tanel-00001' , 'osch-tanel-00002' , 'osch-tanel-00003' , 'osch-tanel-00004' , 'osch-tanel-00005' , 'osch-tanel-00006' , 'osch-tanel-00007' , 'osch-tanel-00008' , 'osch-tanel-00009' , 'osch-tanel-00010' ... , 'osch-tanel-00098' , 'osch-tanel-00099' ) ... SoluOon:+Create+more+ "locaOon+pointer"+files.++ 100+"locaOon+pointer+files",+ each+poinOng+to+a+single+ HDFS+file+(in+my+test)+ This+allows+up#to+100+slaves+ in+parallel,+accessing+one+ HDFS+stream+each.+
  14. 14. www.enkitec.com++ 14+++ More+"finePgrained"+OSCH+data+locaOon+files+ $ cat osch-tanel-00099 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <locationFile> <header> <version>1.0</version> <fileName>osch-tanel-00099</fileName> <createDate>2013-07-08T14:03:24</createDate> <publishDate>2013-07-08T02:03:24</publishDate> <productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName> <productVersion>2.1.0</productVersion> </header> <uri_list> <uri_list_item size="10000000000" compressionCodec=""> hdfs://enkbda-ns/user/acolvin/terasort/part-00099 </uri_list_item> </uri_list> </locationFile> $ ls -l osch-tanel* -rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00000 -rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00001 -rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00002 -rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00003 ... -rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00099 100+files,+allowing+up'to+100+ HDFS+streams+in+parallel.+ + With+less+PX+slaves,+each+ slave+can+access+mulOple+ files+sequenOally.++
  15. 15. www.enkitec.com++ 15+++ BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+ 500P600+MB/s+load+by+single+ DB+node+(1P2+TB+hour)++
  16. 16. www.enkitec.com++ 16+++ BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+ Skewed/Unbalanced+parallel+ execuOon:+4+slaves+work+for+ longer+when+others+are+ done+(4+x+32+++4+=+100+files)+
  17. 17. www.enkitec.com++ 17+++ Hadoop+Cluster+CPUs+are+idle?!+
  18. 18. www.enkitec.com++ 18+++ Drilling+deeper+into+the+CPU+usage+ SQL> @ostackprof 788 0.1 100 Below is the stack prefix common to all samples: ------------------------------------------------------------------------ Frame->function() ------------------------------------------------------------------------ # 49 ->main() .... some lines snipped ..... # 11 ->pextproc() # 10 ->spefmccallstd() # 9 ->spefcpfa() # 8 ->qxxqFetch() # 7 ->kpxsFetch() # 6 ->kpxsFetchField() # 5 ->kpxsFetchDriver() .... some lines snipped ..... # -#-------------------------------------------------------------------- # - Num.Samples -> in call stack() # ---------------------------------------------------------------------- 35 ->kudmxfe()->kudmdtp()->lxoSchPat() 25 ->kudmxfe()->kudmdtp()->lxmfwdx() 23 ->kudmxfe()->kudmdtp()-> 4 ->kpxsDoConvert()->OCIDirPathColArrayToStream()->kpudpcs_colArrayToStream()- >kpudpcsf_intColArrayToStream() 3 ->kudmxfe()->lxmfwdx() 3 ->kudmxfe()->kudmrn()->kudmrt() 2 ->qerxtCBFetch()->qerxtProcessRows()->qeaeCn1Serial() 2 ->qerxtCBFetch()->qerxtProcessRows()->klxprParseRow() 1 ->OCIDirPathColArrayReset() 83%+of+Ome+spent+in+ datatype+conversion+(kudm)+ ++ 60%+in+lx*+funcOons+–+string/ datatype+processing++
  19. 19. www.enkitec.com++ 19+++ Datatype'Conversion'is'CPU'hungry!!!' You+can+offload+the+ "preprocessing+and+datatype+ conversion"+to+the+Hadoop+ cluster+CPUs+with+the+Oracle' Loader'for'Hadoop!'
  20. 20. www.enkitec.com++ 20+++ Oracle+Loader+for+Hadoop+ Hadoop+Cluster+ HDFS+ MR+ job+ MR+ job+ MapReduce+ (+CPU+)+ Oracle+Database+ Storage+ MR+ job+ MR+ job+DB+Process+ I/O+ O/I+ With+OCI/DataPump+ it's+possible+to+ convert+data+to+ Oracle+naOve+format+ No+datatype+ conversion+needed+ HCC+compression?+ DB+Waits+ ContenBon?' + Array+insert+(JDBC)+ Direct+Path+Load+(OCI)+ Create+DataPump+file+ (load+via+ext+table)+ Already'preG converted'data'is' sent'to'Oracle'
  21. 21. www.enkitec.com++ 21+++ •  Source:(High(Performance(Connectors(for(Load(and(Access(of(Data(from( Hadoop(to(Oracle(Database(( •  June+2012+ •  h.p://www.oracle.com/technetwork/bdc/hadoopPloader/connectorsPhdfsP wpP1674035.pdf+ Based+on+earlier+tests,+ these+numbers+are+ plausible.+(although+your+ mileage+will+vary+ depending+on+the+data+ you+convert+and+load)+
  22. 22. www.enkitec.com++ 22+++ Oracle+Loader+for+Hadoop+ •  Can+preprocess+and+convert+datatypes+to+Oracle+"naOve"+ format+using+Hadoop+cluster's+CPU+cycles+ •  DataPump+format+ •  OCI+Direct+Path+load+format+ •  Each+Reducer+in+Hadoop+connects+to+Oracle+DB+with+a+ separate+session+(OCI/JDBC)+ •  So+OCI+direct+path+loads+must+be+done+into+parOOoned+tables!+ •  Otherwise+you'll+get+TM+enqueue+contenOon+ •  Oracle+Loader+takes+care+of+the+distribuOon+ •  As+long+as+you+have+enough+reducers+configured+
  23. 23. www.enkitec.com++ 23+++ References+ OTN+Big+Data+Connectors+page+ •  h.p://www.oracle.com/technetwork/bdc/bigPdataPconnectors/ overview/index.html+ Oracle+Big+Data+Connectors+User's+Guide+ •  h.p://docs.oracle.com/cd/E41604_01/doc.22/e41238/toc.htm+ •  Tools+ •  dstat+ •  h.p://dag.wieers.com/homePmade/dstat/+ •  SwingBench+CPU+Monitor+ •  h.p://www.dominicgiles.com/cpumonitor.html+ + +
  24. 24. www.enkitec.com++ 24+++ Thanks!!!+ •  QuesOons?+ •  Ask+now+:)+ •  Or+Contact+ •  tanel@tanelpoder.com+ •  h.p://blog.tanelpoder.com+ •  @tanelpoder+ + •  h.p://www.enkitec.com+ •  We+rock!+;P)+

×