Apache	
  Tajo	
  Quick	
  Start	
  
	
  
Jinho	
  Kim	
  
(jinossy@gmail.com)	
  
Agenda	
  
•  Introduc6on	
  to	
  Tajo	
  
	
  
•  Tajo	
  Quick	
  Start	
  
•  Introduc6on	
  to	
  Text	
  files	
  
About	
  me	
  
•  Jinho	
  Kim	
  
–  Senior	
  Research	
  Engineer,	
  Gruter	
  Corp	
  (2011.	
  5	
  ~)	
  
–  Full-­‐Hme	
  contributor	
  to	
  Apache	
  Tajo	
  (2013.6	
  ~	
  )	
  
–  Apache	
  Tajo	
  PMC	
  member	
  and	
  commiOer	
  (2013.3	
  ~	
  )	
  
	
  
•  Contacts	
  
–  Email:	
  jhkim	
  AT	
  apache.org	
  
–  Linkedin:	
  hOp://linkedin.com/in/jinossy/	
  
–  TwiOer:	
  @jinossy	
  
INTRODUCTION	
  TO	
  TAJO	
  
Apache	
  Tajo	
  
•  Open-­‐source	
  big	
  data	
  warehouse	
  (also	
  called	
  	
  
SQL-­‐on-­‐hadoop)	
  system	
  
•  Apache	
  Top-­‐level	
  project	
  since	
  March	
  2014	
  
•  Supports	
  SQL	
  standards	
  
•  Low	
  latency,	
  and	
  long	
  running	
  batch	
  queries	
  
•  0.9.0	
  released	
  in	
  Oct	
  2014.	
  
Hadoop	
  eco-­‐system	
  Integra6on	
  
•  De-­‐factor	
  standard	
  file	
  format	
  support	
  
–  Parquet,	
  RCFile,	
  SequenceFile,	
  and	
  Text	
  files	
  
•  Hcatalog	
  support	
  
–  Enable	
  Tajo	
  to	
  access	
  exisHng	
  tables	
  used	
  in	
  Hive	
  and	
  
others	
  
•  Yarn	
  support	
  
–  Tajo	
  can	
  be	
  run	
  on	
  Yarn	
  cluster	
  by	
  using	
  Apache	
  Slider.	
  
Overall	
  Architecture	
  
	
  
	
  
	
  
	
  
Master	
  Server	
  (HA)	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Client	
  
	
  
	
  
	
  
	
  
JDBC	
   TSql	
   Web	
  UI	
  
CatalogStore
DBMS	
  
HCatalog	
  
Submit	
  a	
  Query	
  
Manage	
  metadata	
  
Allocate	
  a	
  query	
  
Send	
  task	
  
&	
  monitor	
  
Send	
  task	
  
&	
  monitor	
  
	
  
	
  
Slave	
  Server	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
TajoWorker	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
QueryMaster
Local	
  FileSyst
em
HDFS
Local	
  Query	
  Engine
StorageManager
	
  
	
  
Slave	
  Server	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
TajoWorker	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
QueryMaster
Local	
  FileSyst
em
HDFS
Local	
  Query	
  Engine
StorageManager
	
  
	
  
Slave	
  Server	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
TajoWorker	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
QueryMaster
Local	
  FileSyst
em
HDFS
Local	
  Query	
  Engine
StorageManager
타조마스터	
  
TajoMaster	
  
TAJO	
  QUICK	
  START	
  
	
  
 
	
  
본 문서는 개발 브랜치 기준으로 작성 되었으며 문서 작성일 기준으로 공식 릴
리즈 되지 않았습니다. 아래 링크는 미리 준비한 Tarball	
  입니다.	
  
	
  
hOp://people.apache.org/~jhkim/tajo-­‐0.10.0-­‐
SNAPSHOT.tar.gz	
  
	
  
	
  
	
  
Installing	
  Tajo	
  
•  Requirement	
  
–  Linux	
  
–  JDK	
  1.6	
  or	
  1.7	
  	
  
–  Hadoop	
  2.3.0	
  or	
  higher	
  
•  $	
  wget	
  hOp://archive.apache.org/dist/hadoop/common/
hadoop-­‐2.6.0/hadoop-­‐2.6.0.tar.gz	
  
•  $	
  tar	
  xvzf	
  hadoop-­‐2.6.0.tar.gz	
  
•  From	
  a	
  Release	
  Tarball	
  	
  	
  
–  $	
  wget	
  hOp://archive.apache.org/dist/tajo/tajo-­‐X.X.X/tajo-­‐
X.X.X.tar.gz	
  
–  $	
  tar	
  xvzf	
  tajo-­‐X.X.X-­‐SNAPSHOT.tar.gz	
  
Installing	
  Tajo	
  
•  Requirement	
  
–  Maven	
  3	
  
–  Protocol	
  buffer	
  2.5.0	
  
•  Building	
  from	
  source	
  code	
  
–  $	
  git	
  clone	
  hOps://github.com/apache/tajo.git	
  tajo	
  
–  $	
  cd	
  tajo	
  
–  $	
  mvn	
  clean	
  install	
  -­‐Pdist	
  -­‐DskipTests	
  –Dtar	
  
– $	
  cp	
  tajo-­‐dist/target/tajo-­‐X.X.X-­‐SNAPSHOT.tar.gz	
  
{TAJO_HOME}	
  
Tajo	
  Cluster	
  Mode	
  
•  Local	
  mode	
  
–  A	
  local	
  mode	
  Tajo	
  instance	
  can	
  start	
  up	
  with	
  very	
  simple	
  
configuraHons.	
  
•  Fully	
  distributed	
  mode	
  
–  A	
  fully	
  distributed	
  mode	
  enables	
  a	
  Tajo	
  instance	
  to	
  run	
  
on	
  (HDFS).	
  In	
  this	
  mode,	
  a	
  number	
  of	
  Tajo	
  workers	
  run	
  
across	
  a	
  number	
  of	
  the	
  physical	
  nodes	
  where	
  HDFS	
  data	
  
nodes	
  run.	
  
	
  
Seng	
  up	
  a	
  Local	
  mode	
  
•  Local	
  mode	
  
Hadoop	
  Cluster	
  없이
 1대
 장비로
 구성
 가능하며
 Local	
  file	
  을
 주
로
 사용할경우
 추천	
  
•  SSH	
  
•  conf/tajo-­‐env.sh	
  
•  Launch	
  a	
  Tajo	
  cluster	
  
export	
  HADOOP_HOME={HADOOP_HOME}	
  
export	
  JAVA_HOME={JAVA_HOME}	
  
export	
  TAJO_WORKER_HEAPSIZE=1000	
  
#	
  export	
  TAJO_LOG_DIR=${TAJO_HOME}/logs	
  
$	
  ssh-­‐keygen	
  -­‐t	
  rsa	
  
$	
  ssh-­‐copy-­‐id	
  ~/.ssh/id_rsa.pub	
  {hostname}	
  
$	
  $TAJO_HOME/bin/start-­‐tajo.sh	
  
Seng	
  up	
  a	
  Local	
  mode	
  -­‐	
  Op6onal	
  
•  tajo.rootdir	
  (tajo-­‐site.xml)	
  
–  warehouse,	
  system	
  등	
  	
  데이터
 저장
 디렉토리	
  
	
  
•  tajo.worker.tmpdir.loca6ons	
  (tajo-­‐site.xml)	
  
–  Query	
  실행에
 필요한
 중간
 데이터
 저장
 디렉토리	
  
	
  
	
  
	
  property	
  
	
  	
  nametajo.rootdir/name	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  valuefile:///tajo/meetup/warehouse/value	
  
	
  	
  descripHonBase	
  directory	
  including	
  system	
  directories./descripHon	
  
/property	
  
property	
  
	
  	
  nametajo.worker.tmpdir.locaHons/name	
  
	
  	
  value/tmp/tajo-­‐${user.name}/tmpdir/value	
  
	
  	
  descripHonA	
  base	
  for	
  other	
  temporary	
  directories./descripHon	
  
/property	
  
Seng	
  up	
  a	
  Fully	
  distributed	
  mode	
  
•  conf/tajo-­‐site.xml	
  
–  Master	
  와
 Worker	
  들의
 연결에
 필요한
 RPC	
  설정	
  
property	
  
	
  	
  nametajo.rootdir/name	
  
	
  	
  valuehdfs://hostname:port/tajo/value	
  
/property	
  
property	
  
	
  	
  nametajo.master.umbilical-­‐rpc.address/name	
  
	
  	
  valuehostname:26001/value	
  
/property	
  
property	
  
	
  	
  nametajo.master.client-­‐rpc.address/name	
  
	
  	
  valuehostname:26002/value	
  
/property	
  
property	
  
	
  	
  nametajo.resource-­‐tracker.rpc.address/name	
  
	
  	
  valuehostname:26003/value	
  
/property	
  
property	
  
	
  	
  nametajo.catalog.client-­‐rpc.address/name	
  
	
  	
  valuehostname:26005/value	
  
/property	
  
Seng	
  up	
  a	
  Fully	
  distributed	
  mode	
  
•  SSH	
  
–  모든
 Tajo	
  Worker

Tajo Seoul Meetup-201501