Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MapR Tutorial Series

699 views

Published on

This tutorial series highlights some features of MapR, converged data platform.

Published in: Technology
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/6Bva6 ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MapR Tutorial Series

  1. 1. MapR Learning Guide Selvaraaju Murugesan May 6, 2017 Selvaraaju Murugesan MapR Learning Guide
  2. 2. Storage Pool MapR-FS groups disks into storage pools, usually made up of two or three disks Stripe Width parameter lets you congure number of disks per storage pool Each node in a MapR cluster can support up to 36 storage pools Use mrcong command to create, remove and manage storage polols, disk groups and disks Selvaraaju Murugesan MapR Learning Guide
  3. 3. Example 1 If you have 11 disks in a node, how many storage pools will be created by default? Selvaraaju Murugesan MapR Learning Guide
  4. 4. Example 1 Solution If you have 11 disks in a node, how many storage pools will be created by default? 3 storage pool of 3 disks each 1 storage pool of 2 disks Selvaraaju Murugesan MapR Learning Guide
  5. 5. Example 2 If you have 9 disks in a node, how many storage pools will be created by default? Selvaraaju Murugesan MapR Learning Guide
  6. 6. Example 2 Solution If you have 9 disks in a node, how many storage pools will be created by default? 3 storage pool of 3 disks each Selvaraaju Murugesan MapR Learning Guide
  7. 7. Tradeos If a disk fails in a storage pool, then an entire storage pool is taken oine and MapR will automatically begin data replication More disks increase more data to be replicated in case of disk failure Ideal scenario is have 3 disks per storage pool Remember to have same size and speed disk drives in a storage pool for good performance Selvaraaju Murugesan MapR Learning Guide
  8. 8. List of Ports Port Number Services 7221 CLDB 8443 MCS 9443 MapR Installer 8888 Hue 8047 Drill 5181 Zookeeper 19888 ResourceManager Selvaraaju Murugesan MapR Learning Guide
  9. 9. Default Settings If a disk fails, then the data replication starts immediately If a node fails, then the data replication starts after an hour (60 minutes) Node maintenance default time out is 1 hour after which data replication starts (timeout is congurable) To see / change conguration use the comand maprcli cong load If the CLDB heartbeat is greater than 5 seconds, an alarm is raised and must be cleared manually Secondary CLDB in a node will perform read operations Selvaraaju Murugesan MapR Learning Guide
  10. 10. CLDB Name container holds the metadata for the les and directories in the volume, and the rst 64 KB of each le Data container and Name container can have dierent replication factors Data replication happens at volume level For high availability, install more Zookeeper in the nodes /opt/mapr/roles Contains the list of congured services on a given node /opt/cores Core les are copies of the contents of memory when certain anomalies are detected. Core les are located in /opt/cores, and the name of the le will include the name of the service that experienced an issue. When a core le is created, an alarm is raised Selvaraaju Murugesan MapR Learning Guide
  11. 11. Zookeeper If you want to start zookeeper service mapr-zookeeper start If you want to stop zookeeper service mapr-zookeeper stop If you want to know the status of zookeeper service mapr-zookeeper qstatus ZooKeeper should always be the rst service that is started Selvaraaju Murugesan MapR Learning Guide
  12. 12. MapR Commands To know list of services in a node maprcli service list maprcli node list -columns id,ip,svc To list CLDBs maprcli node listcldbs CLDB master maprcli node cldbmaster Node topology maprcli node topo Selvaraaju Murugesan MapR Learning Guide
  13. 13. Cluster Permissions Log into the MCS (login) This level also includes permissions to use the API and command-line interface, and grants read access on the cluster and its volumes Start and stop services (SS) Create volumes (CV) Edit and view Access Control Lists, or permissions (A) Full control gives user the ability to do everything except edit permissions (FC) Selvaraaju Murugesan MapR Learning Guide
  14. 14. Volume Permissions Dump or back up the volume (dump) Mirror or restore the volume (restore) Modify volume properties, which includes creating and deleting snapshots, (m) Delete the volume (d) View and edit volume permissions (A) Perform all operations except view and edit volume permissions (FC) Selvaraaju Murugesan MapR Learning Guide
  15. 15. MapR Utilities Congure.sh To setup a cluster node To change services such as zookeeper, CLDB, etc.. Disksetup formats specied disks for use by MapR storage fsck used to nd and x inconsistencies in the lesystem to make the metadata consistent on the next load of the storage pool gfsck performs a scan and repair operation on a cluster, volume, or snapshot Selvaraaju Murugesan MapR Learning Guide
  16. 16. MapR Utilities mrcong create, remove, and manage storage pools, disk groups, and disks; and provide information about containers mapr-support-collect.sh collect diagnostic information from all nodes in the cluster mapr-support-dump.sh ollects node and cluster-level information about the node where the script is invoked cldbguts monitor the activity of the CLDB Selvaraaju Murugesan MapR Learning Guide
  17. 17. NTP Server All nodes should synchronize to one internal NTP server systemctl command ntpq command Selvaraaju Murugesan MapR Learning Guide
  18. 18. Logs Centralised logging Logs kept for 30 days by default symbolic links to the logs Local logging logs kept for 3 hours by default YARN logs expire after 3 hours time starts after the job begins Logs stord in /opt/mapr/logs deleted after 10 days by default Change the settings in yarn-site.xml le Retention time are given in seconds Selvaraaju Murugesan MapR Learning Guide
  19. 19. Space Requirements /opt - 128GB /tmp - 10GB /opt/mapr/zkdata 500MB Swap space 110% physical memory Minimum of 24GB and maximum of 128GB Use LVM for boot drives Selvaraaju Murugesan MapR Learning Guide
  20. 20. Volume Quota Once the Advisory Quota is reached alarm raised Once Hard Quota is reached no futher data is written Only compressed data size is counted against the volume quota Selvaraaju Murugesan MapR Learning Guide
  21. 21. Pre / Post-Installation Check Pre-installation check Stream CPU Iozone I/O speed memory (destructive write/read) Rpctest network speed Post-installation check DFSIO - I/O speed memory (mapreduce job) RWspeedtest TerraGen / Terrasort mapreduce job Terrasort job suggest possible problem with hard drive or controller Selvaraaju Murugesan MapR Learning Guide
  22. 22. Snapshot / Mirror Snapshots are stored at top level of every volume (hidden directory) Scheduled snapshots expire automatically Mirror start - start mirror operation between source destination Mirror push - push updates from source volume to all mirror volume Mirror operation uses 70% network bandwidth les are compressed Selvaraaju Murugesan MapR Learning Guide
  23. 23. Role / Disk Balancer Disk balancer redistributes the data in all nodes use disk balancer after you have added many new nodes % concurrent disk rebalancer 2 to 30% Role balancer evenly distriburtes master containers o by default; starts after 30 minutes after CLDB (can be congured) Delay for active data 120 sec 1800 sec (2 min 30 min) Selvaraaju Murugesan MapR Learning Guide
  24. 24. Job Scheduler Fair scheduler is default FIFO Capacity scheduler Can be on memory; also on CPU User has each own queue Weights to set resources Allocation le (reloaded every 10 seconds) to modify resource managers /opt/mapr/Hadoop/version/etc/hadoop /fair-scheduler.xml Selvaraaju Murugesan MapR Learning Guide

×