A tutorial presentation based on github.com/amplab/shark documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on hbase.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on hadoop.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
HBaseConEast2016: HBase on Docker with ClusterdockMichael Stack
This document discusses using clusterdock, an open-source container orchestration framework, to deploy and manage Apache HBase clusters on Docker. It provides an overview of Docker and clusterdock, describes how the HBase topology uses clusterdock to build and start HBase clusters quickly, and demos running an HBase integration test on a clusterdock cluster. It also discusses plans to use clusterdock for running HBase integration tests upstream and improving the release process.
Hive is a data warehousing tool for querying large datasets in Hadoop. It allows SQL-like queries using HiveQL. Hive uses a metastore to store metadata about tables and partitions. Data can be loaded into tables from files on HDFS or other data stores. Common commands include CREATE, INSERT, SELECT, JOIN, and ALTER. Partitioning and bucketing can improve query performance. Views and indexes can also be created for optimization.
Implementing Hadoop on a single clusterSalil Navgire
This document provides instructions for setting up and running Hadoop on a single node cluster. It describes how to install Ubuntu, Java, Python and configure SSH. It then explains how to install and configure Hadoop, including editing configuration files and setting permissions. Instructions are provided for formatting the namenode, starting the cluster, running MapReduce jobs, and accessing the Hadoop web interfaces. The document also discusses writing MapReduce programs in Python and different Python implementation strategies.
Solr 4: Run Solr in SolrCloud Mode on your local file system.gutierrezga00
Running Solr in SolrCloud Mode on your local file system using Solr version 4.10.3. It demonstrate how configure the Apache Solr binaries so that you can create any number of SolrCloud instances without having the need to modified the binaries.
YouTube: http://youtu.be/70AKyQYoLqM
Download sample SolrCloud scripts: https://github.com/gutierrezga00/SolrCloud_LocalFileSystem
Presentation Slides: http://www.slideshare.net/gutierrezga00/solr-cloud-local-file-system
Download Solr version 4.10.3: http://lucene.apache.org/solr/
Download Zookeeper version 3.4.6: http://zookeeper.apache.org/
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
A tutorial presentation based on hbase.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on hadoop.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
HBaseConEast2016: HBase on Docker with ClusterdockMichael Stack
This document discusses using clusterdock, an open-source container orchestration framework, to deploy and manage Apache HBase clusters on Docker. It provides an overview of Docker and clusterdock, describes how the HBase topology uses clusterdock to build and start HBase clusters quickly, and demos running an HBase integration test on a clusterdock cluster. It also discusses plans to use clusterdock for running HBase integration tests upstream and improving the release process.
Hive is a data warehousing tool for querying large datasets in Hadoop. It allows SQL-like queries using HiveQL. Hive uses a metastore to store metadata about tables and partitions. Data can be loaded into tables from files on HDFS or other data stores. Common commands include CREATE, INSERT, SELECT, JOIN, and ALTER. Partitioning and bucketing can improve query performance. Views and indexes can also be created for optimization.
Implementing Hadoop on a single clusterSalil Navgire
This document provides instructions for setting up and running Hadoop on a single node cluster. It describes how to install Ubuntu, Java, Python and configure SSH. It then explains how to install and configure Hadoop, including editing configuration files and setting permissions. Instructions are provided for formatting the namenode, starting the cluster, running MapReduce jobs, and accessing the Hadoop web interfaces. The document also discusses writing MapReduce programs in Python and different Python implementation strategies.
Solr 4: Run Solr in SolrCloud Mode on your local file system.gutierrezga00
Running Solr in SolrCloud Mode on your local file system using Solr version 4.10.3. It demonstrate how configure the Apache Solr binaries so that you can create any number of SolrCloud instances without having the need to modified the binaries.
YouTube: http://youtu.be/70AKyQYoLqM
Download sample SolrCloud scripts: https://github.com/gutierrezga00/SolrCloud_LocalFileSystem
Presentation Slides: http://www.slideshare.net/gutierrezga00/solr-cloud-local-file-system
Download Solr version 4.10.3: http://lucene.apache.org/solr/
Download Zookeeper version 3.4.6: http://zookeeper.apache.org/
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
Part 1 of a three part presentation showing how nutch and solr may be used to crawl the web, extract data and prepare it for loading into a data warehouse.
Nutch is an open source web crawler built on Hadoop that can be used to crawl websites at scale. It integrates directly with Solr to index crawled content. HDFS provides a scalable storage layer that Nutch and Solr can write to and read from directly. This allows building indexes for Solr using Hadoop's MapReduce framework. Morphlines allow defining ETL pipelines to extract, transform, and load content from various sources into Solr running on HDFS.
eZ Cluster allows running an eZ Publish installation on multiple servers for improved performance, redundancy, and scalability. It matches the database storage for metadata with either database or network file system storage for content files. The cluster handlers store metadata in the database and files either in the database or on an NFS server. Configuration involves setting the cluster handler, storing files on the database or NFS, moving existing files to the cluster, rewriting URLs, and indexing binary files. The cluster API provides methods for reading, writing, and caching files while handling concurrency and stale caching.
Installing Apache Hive, internal and external table, import-export Rupak Roy
Perform Hive installation with internal and external table import-export and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Install Drupal in Ubuntu by Tushar B. KuteTushar B Kute
This document provides instructions for installing Drupal on an Ubuntu system using LAMP stack. It describes downloading and installing the LAMP components using apt-get, downloading and extracting Drupal into the /var/www/html folder, creating a MySQL database, configuring Drupal, and completing the installation process to set up the site. It then mentions visiting the site and using the dashboard to begin designing the site.
This document provides instructions for installing Hadoop and Hive. It outlines pre-requisites like Java, downloading Hadoop and Hive tarballs. It describes setting environment variables, configuring Hadoop for pseudo-distributed mode, formatting HDFS and starting services. Instructions are given for starting the Hive metastore and using sample rating data in Hive queries to learn the basics.
Drupal from Scratch provides a comprehensive guide to installing Drupal on a Debian-based system using command lines. The document outlines how to install Drupal Core, set up a MySQL database, configure a virtual host for local development, and complete the first Drupal site installation. Key steps include downloading and extracting Drupal Core, installing prerequisite software like PHP and Apache, creating a database, enabling virtual hosts, and navigating the Drupal installation process.
This document provides instructions for setting up Hadoop in single node mode on Ubuntu. It describes adding a Hadoop user, installing Java and SSH, downloading and extracting Hadoop, configuring environment variables and Hadoop configuration files, and formatting the NameNode.
Get acquainted with the differences in scoop, the added advantages with hands-on implementation
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Training on DSpace Institutional Repository
Organized by
BALID Institute of Information Management (BIIM
DSpace Manual for BALID Trainee
Institutional Repository
1-2 May 2014
Venue: CIRDAP
• Installation of DSpace on Debian
• Configuration of DSpace
• Customization of Dspace
• Cron Jobs setup for production system
• MTA Setup for DSpace
• Some Important Commands of PostgreSQL
• DSpace Discovery Setup
Prepared By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
This document provides information on using Perl to interact with and manipulate databases. It discusses:
- Using the DBI module to connect to databases in a vendor-independent way
- Installing Perl modules like DBI and DBD drivers to connect to specific databases like Postgres
- Preparing the Postgres database environment, including initializing and starting the database
- Using the DBI handler and statements to connect to and execute queries on the database
- Retrieving and manipulating database records through functions like SELECT, adding new records, etc.
The document provides code examples for connecting to Postgres with Perl, executing queries to retrieve data, and manipulating the database through operations like inserting new records. It focuses on
A small slide stack to introduce an audience to DSpace, what it is, what makes it go, what you get out of the box, and how to start off working with it. Originally delivered to a group of developers at UCLA Library, so there might be some UCLA-specific links, that don't work for non-UCLA types. Sorry about that.
This document provides an overview of advanced topics in Hive including views, indexes, partitions, bucketing, and user-defined functions (UDFs). It describes how views allow saved queries to be treated like tables, how indexes can improve query performance on certain columns, how partitions and bucketing divide tables into parts based on column values, and how UDFs extend Hive's functionality by implementing functions in Java.
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
This document describes how to set up a single-node Hadoop installation to perform MapReduce operations. It discusses supported platforms, required software including Java and SSH, and preparing the Hadoop cluster in either local, pseudo-distributed, or fully-distributed mode. The main components of the MapReduce execution pipeline are explained, including the driver, mapper, reducer, and input/output formats. Finally, a simple word count example MapReduce job is described to demonstrate how it works.
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
This is my Seminar presentation, adopted from a paper with the same name (Big Data Processing in Cloud Computing Environments), and it is about various issues of Big Data, from its definitions and applications to processing it in cloud computing environments. It also addresses the Big Data technologies and focuses on MapReduce and Hadoop.
This document discusses cloud and big data technologies. It provides an overview of Hadoop and its ecosystem, which includes components like HDFS, MapReduce, HBase, Zookeeper, Pig and Hive. It also describes how data is stored in HDFS and HBase, and how MapReduce can be used for parallel processing across large datasets. Finally, it gives examples of using MapReduce to implement algorithms for word counting, building inverted indexes and performing joins.
Part 1 of a three part presentation showing how nutch and solr may be used to crawl the web, extract data and prepare it for loading into a data warehouse.
Nutch is an open source web crawler built on Hadoop that can be used to crawl websites at scale. It integrates directly with Solr to index crawled content. HDFS provides a scalable storage layer that Nutch and Solr can write to and read from directly. This allows building indexes for Solr using Hadoop's MapReduce framework. Morphlines allow defining ETL pipelines to extract, transform, and load content from various sources into Solr running on HDFS.
eZ Cluster allows running an eZ Publish installation on multiple servers for improved performance, redundancy, and scalability. It matches the database storage for metadata with either database or network file system storage for content files. The cluster handlers store metadata in the database and files either in the database or on an NFS server. Configuration involves setting the cluster handler, storing files on the database or NFS, moving existing files to the cluster, rewriting URLs, and indexing binary files. The cluster API provides methods for reading, writing, and caching files while handling concurrency and stale caching.
Installing Apache Hive, internal and external table, import-export Rupak Roy
Perform Hive installation with internal and external table import-export and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Install Drupal in Ubuntu by Tushar B. KuteTushar B Kute
This document provides instructions for installing Drupal on an Ubuntu system using LAMP stack. It describes downloading and installing the LAMP components using apt-get, downloading and extracting Drupal into the /var/www/html folder, creating a MySQL database, configuring Drupal, and completing the installation process to set up the site. It then mentions visiting the site and using the dashboard to begin designing the site.
This document provides instructions for installing Hadoop and Hive. It outlines pre-requisites like Java, downloading Hadoop and Hive tarballs. It describes setting environment variables, configuring Hadoop for pseudo-distributed mode, formatting HDFS and starting services. Instructions are given for starting the Hive metastore and using sample rating data in Hive queries to learn the basics.
Drupal from Scratch provides a comprehensive guide to installing Drupal on a Debian-based system using command lines. The document outlines how to install Drupal Core, set up a MySQL database, configure a virtual host for local development, and complete the first Drupal site installation. Key steps include downloading and extracting Drupal Core, installing prerequisite software like PHP and Apache, creating a database, enabling virtual hosts, and navigating the Drupal installation process.
This document provides instructions for setting up Hadoop in single node mode on Ubuntu. It describes adding a Hadoop user, installing Java and SSH, downloading and extracting Hadoop, configuring environment variables and Hadoop configuration files, and formatting the NameNode.
Get acquainted with the differences in scoop, the added advantages with hands-on implementation
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Training on DSpace Institutional Repository
Organized by
BALID Institute of Information Management (BIIM
DSpace Manual for BALID Trainee
Institutional Repository
1-2 May 2014
Venue: CIRDAP
• Installation of DSpace on Debian
• Configuration of DSpace
• Customization of Dspace
• Cron Jobs setup for production system
• MTA Setup for DSpace
• Some Important Commands of PostgreSQL
• DSpace Discovery Setup
Prepared By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
This document provides information on using Perl to interact with and manipulate databases. It discusses:
- Using the DBI module to connect to databases in a vendor-independent way
- Installing Perl modules like DBI and DBD drivers to connect to specific databases like Postgres
- Preparing the Postgres database environment, including initializing and starting the database
- Using the DBI handler and statements to connect to and execute queries on the database
- Retrieving and manipulating database records through functions like SELECT, adding new records, etc.
The document provides code examples for connecting to Postgres with Perl, executing queries to retrieve data, and manipulating the database through operations like inserting new records. It focuses on
A small slide stack to introduce an audience to DSpace, what it is, what makes it go, what you get out of the box, and how to start off working with it. Originally delivered to a group of developers at UCLA Library, so there might be some UCLA-specific links, that don't work for non-UCLA types. Sorry about that.
This document provides an overview of advanced topics in Hive including views, indexes, partitions, bucketing, and user-defined functions (UDFs). It describes how views allow saved queries to be treated like tables, how indexes can improve query performance on certain columns, how partitions and bucketing divide tables into parts based on column values, and how UDFs extend Hive's functionality by implementing functions in Java.
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
This document describes how to set up a single-node Hadoop installation to perform MapReduce operations. It discusses supported platforms, required software including Java and SSH, and preparing the Hadoop cluster in either local, pseudo-distributed, or fully-distributed mode. The main components of the MapReduce execution pipeline are explained, including the driver, mapper, reducer, and input/output formats. Finally, a simple word count example MapReduce job is described to demonstrate how it works.
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
This is my Seminar presentation, adopted from a paper with the same name (Big Data Processing in Cloud Computing Environments), and it is about various issues of Big Data, from its definitions and applications to processing it in cloud computing environments. It also addresses the Big Data technologies and focuses on MapReduce and Hadoop.
This document discusses cloud and big data technologies. It provides an overview of Hadoop and its ecosystem, which includes components like HDFS, MapReduce, HBase, Zookeeper, Pig and Hive. It also describes how data is stored in HDFS and HBase, and how MapReduce can be used for parallel processing across large datasets. Finally, it gives examples of using MapReduce to implement algorithms for word counting, building inverted indexes and performing joins.
A tutorial presentation based on spark.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A presentation about Yahoo! S4 and Apache S4. I gave this presentation for Cloud Computing course of Dr. Payberah @ AUT fall 2014.
The lecturer's references are Yahoo! S4 paper and Apache S4 website.
Big data Clustering Algorithms And StrategiesFarzad Nozarian
The document discusses various algorithms for big data clustering. It begins by covering preprocessing techniques such as data reduction. It then covers hierarchical, prototype-based, density-based, grid-based, and scalability clustering algorithms. Specific algorithms discussed include K-means, K-medoids, PAM, CLARA/CLARANS, DBSCAN, OPTICS, MR-DBSCAN, DBCURE, and hierarchical algorithms like PINK and l-SL. The document emphasizes techniques for scaling these algorithms to large datasets, including partitioning, sampling, approximation strategies, and MapReduce implementations.
PHP is a server-side scripting language used for web development. It allows embedding PHP code in HTML pages which will be executed on the server to generate dynamic web page content. The document outlines an agenda for a PHP training session, including a warm up on SQL and XAMPP, a presentation on PHP basics, a practical coding exercise, and questions. It also provides some background on PHP including its history, alternatives, and how it generates web pages. Key PHP concepts like variables, strings, arrays, and object-oriented programming are briefly introduced.
This document summarizes lessons learned from installing a development stack using Puppet on Linux, Mac OSX, and Windows operating systems. It discusses using Puppet to automate the installation of tools like Atlassian, Sonar, Nexus, and MySQL. Puppet was chosen for its declarative syntax that does not require programming skills. Examples are provided for installing Nexus on Ubuntu, CentOS, and OSX. Adapting the Puppet code to different operating systems required handling package and service naming differences as well as command line differences. Significant challenges were encountered when trying to use Puppet on Windows due to the lack of standard commands and limited supported resources. Ruby was used to create new Puppet providers and resources to download
Presented at BSides Wellington (November 23-24th, 2017; Wellington, New Zealand)
Presented (alpha version) at CSides Canberra (November 17th, 2017; Canberra, Australia)
"Playing with shiny tech, and maybe improving my offensive capacity along the way."
https://github.com/0xdevalias/gopherblazer
This document presents an agenda for becoming a "console cowboy" by learning to be more productive using the terminal and bash shell. It covers the basic terminal tools, bash usage and configuration, utilities like grep, sed and awk, scripting with variables, conditionals and loops, and tools for developers like Homebrew, Git, Xcode and xcpretty. The goal is to stop using the mouse and automate work by writing scripts to harness the power of the Unix command line.
Step-by-Step Introduction to Apache Flink Slim Baltagi
This a talk that I gave at the 2nd Apache Flink meetup in Washington DC Area hosted and sponsored by Capital One on November 19, 2015. You will quickly learn in step-by-step way:
How to setup and configure your Apache Flink environment?
How to use Apache Flink tools?
3. How to run the examples in the Apache Flink bundle?
4. How to set up your IDE (IntelliJ IDEA or Eclipse) for Apache Flink?
5. How to write your Apache Flink program in an IDE?
This document provides instructions for setting up FreeBSD jails with a shared read-only template and individual read-write partitions for each jail. It describes creating a master template with installed binaries and ports, then creating directories for each jail mounted via nullfs. Individual jail configurations are added to rc.conf and the jails are started and can be managed via jexec. Upgrades involve building a new template and restarting the jails.
The document provides requirements and sample exam questions for the Red Hat Certified Engineer (RHCE) EX294 exam. It outlines 18 exam questions to test Ansible skills. Key requirements include setting up 5 virtual machines, one as the Ansible control node and 4 managed nodes. The questions cover tasks like Ansible installation, ad-hoc commands, playbooks, roles, vaults and more. Detailed solutions are provided for each question/task.
This is a recording of my Advanced Oracle Troubleshooting seminar preparation session - where I showed how I set up my command line environment and some of the main performance scripts I use!
1. To install Wireshark using SlackBuilds, download the SlackBuild files and source files for Wireshark and its dependencies (LUA, QT5, GEOIP, libxkbcommon, libinput, libwacom) from slackbuilds.org.
2. Extract the SlackBuild files and make them executable, then download the corresponding source files for each package.
3. Execute the SlackBuild files from deepest to shallowest dependency order, installing each package before moving on.
4. Once all dependencies are installed, execute the Wireshark SlackBuild file and then install Wireshark.
The document provides instructions for configuring Red Hat Enterprise Linux 5 on VMware before installing Oracle 11gR2. This includes installing additional packages, modifying configuration files, creating users and filesystem directories, and preparing the system. Key steps are installing VMware tools, configuring network interfaces, formatting shared storage, installing the Oracle ASM library driver, and modifying shell profiles for the Oracle software owners. The goal is to prepare a system with a primary node "tom" and failover node "jerry" that is ready for an Oracle Grid 11gR2 installation.
This document provides an overview of common UNIX commands for navigating directories, listing files, editing text, searching for files and strings, compressing files, and more. It describes commands like ls, cd, pwd, vi, grep, find, tar, gzip and man for viewing manual pages. It also explains concepts like pipes, redirection, environment variables and basics of the awk command for text manipulation.
Geecon 2019 - Taming Code Quality in the Worst Language I Know: BashMichał Kordas
I don't know any other languages with more pitfalls, perils and gotchas than Bash. Still, we use it in almost every larger project for deployment or maintenance scripts, because there is no better, more powerful and more universal choice on Unix platform. However, there is ridiculous amount of things that could go wrong if you don't have deep understanding of shell scripting. Your experience about typical issues with Java or other JVM languages is definitely not enough here. You need to deeply understand Linux ecosystem and its history in order to write correct script... or you don't? I will prove to you that Bash could be tamed and made easy if proper code quality standards and static analysis tools are applied and enforced in your delivery pipelines. I'll share my opinions and experiences from a large banking project and I'll tell you which tools and style guides we use.
The document provides an overview of common Unix commands for navigating directories, managing files, and examining file contents. It discusses commands for changing directories (cd), printing the working directory (pwd), listing directory contents (ls), creating/deleting directories and files (mkdir, rmdir, touch, cp, rm, mv), and managing file permissions (chown, chgrp, chmod). It also covers commands for reading/writing text files (more, less, cat), finding files (find), identifying file types (file), counting lines (wc), and searching files (grep).
The document discusses the Bash shell, which is the most popular shell in Linux. It is an sh-compatible shell that incorporates useful features from other shells like Korn and C shells. Bash can be used both interactively and for scripting purposes. The document provides examples of basic Bash scripts that use variables, command substitution, arithmetic evaluation, and conditional statements. It also discusses environmental variables and the read command.
This document provides an overview of shell scripting in Bash. It covers basic script syntax including the shebang line and running scripts. It discusses shell variables, control structures like for loops, and commands like echo, read, and printf for console I/O. The document also covers special variables, command line arguments, and provides exercises for students to practice shell scripting concepts.
This document provides an introduction to Linux and summarizes key topics including:
1. The history and development of Linux including influences from Multics and Unix as well as contributions from developers like Ken Thompson, Dennis Ritchie, and others.
2. Important related operating systems and distributions like BSD, Debian, Ubuntu, and others that helped shape Linux.
3. Core Linux concepts like the Unix philosophy, shells, files/file systems, users/permissions, and commands.
This document provides an overview of the Ruby programming language, including its history, philosophy, characteristics, applications, culture, syntax, built-in types, classes and methods, accessors, control flow, including code, modules, metaprogramming, web frameworks, web servers, shell scripting, testing, JRuby, and calling between Java and Ruby.
The document discusses building a lightweight Docker container for Perl by starting with a minimal base image like BusyBox, copying just the Perl installation and necessary shared libraries into the container, and setting Perl as the default command to avoid including unnecessary dependencies and tools from a full Linux distribution. It provides examples of Dockerfiles to build optimized Perl containers from Gentoo and by directly importing a tarball for minimal size and easy distribution.
This document provides instructions for setting up Apache Spark on Windows and Linux operating systems. It describes how to configure Spark in standalone cluster mode with one master node and two worker nodes. It also explains how to submit Spark applications using the Spark shell or Spark submit, and view the Spark web UI to monitor jobs and clusters.
The document provides an introduction to Linux file systems and navigation, basic Linux commands, and users and groups. It describes:
1) The Linux file system uses a tree structure with root ("/") at the bottom and directories like /bin, /boot, /etc, /home, /lib, /opt, /proc, /sbin, /tmp, /usr, and /var.
2) Basic Linux commands include ls, cd, mkdir, rmdir, mount, df, ps, kill, touch, cat, head, cp, mv, comm, ln, history, wget, curl, find, which, echo, sort, man, tar, printenv, sleep, vi/vim
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
2. Purpose
This guide describes how to get Shark running locally. It creates a small Hive
installation on one machine and allows you to execute simple queries.
The only prerequisite for this guide is that you have Java and Scala
2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can
download it by running:
2
$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz
3. Running Shark In Other Modes
• You can also start your Shark in one of the three other supported modes:
• Running Shark on EC2
• Running Shark on a Cluster
• Running Shark with Tachyon
3
4. Let’s Start…(1/3)
• Download the binary distribution of Shark 0.8.
• The package contains two folders, shark-0.8.0 and hive-0.9.0-
shark-0.8.0-bin.
4
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
hadoop1.tgz # Hadoop 1/CDH3 - or -
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
cdh4.tgz # Hadoop 2/CDH4
$ tar xvfz shark-*-bin-*.tgz
$ cd shark-*-bin-*
• The Shark code is in the shark-0.8.0/ directory.
5. Let’s Start…(2/3)
• To setup your environment to run Shark locally, you need to set
HIVE_HOME and SCALA_HOME environmental variables in a file shark-
0.8.0/conf/shark-env.sh to point to the folders you just downloaded.
• Shark comes with a template file shark-env.sh.template that you can
copy and modify to get started:
5
$ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh
• Now edit the following two lines in shark-env.sh:
export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin
export SCALA_HOME=/path/to/scala-2.9.3
6. Let’s Start…(3/3)
• Next, create the default Hive warehouse directory. This is where Hive will
store table data for native tables:
6
$ sudo mkdir -p /user/hive/warehouse
$ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner
• You can now start the Shark CLI:
$ ./bin/shark
• In addition to the Shark CLI, there are several executables in shark-0.8.0/bin:
bin/shark-withdebug
bin/shark-withinfo
: Runs Shark CLI with DEBUG level logs printed to the console.
: Runs Shark CLI with INFO level logs printed to the console.
7. Lab
Assignment
1. Launch the Shark shell.
2. Create a table called book … .
3. List all the columns of the table book.
4. Load the book table from the file books in
the local filesystem.
5. Create a table called novel, containing
those records from table book … .
6. Print out the list of available tables.
7. Count the number of records from the
table book.
8. Print out the total cost of the books with
authors who have the same last name.
9. Count the number of distinct last names.
10. Drop the tables.
7
8. Lab Assignment 5 (1/5)
1. Launch the Shark shell.
2. Create a table called book whose schema includes book's title,
description, author's first name, last name, and cost.
3. List all the columns of the table book.
8
shark
create table
book(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't';
describe book;
9. Lab Assignment 5 (2/5)
4. Load the book table from the file books in the local filesystem. The books
file has the following format:
9
load data local inpath 'books' into table book;
Speed love Long book about love Brian Dog 10
Long day Story about Monday Emily Blue 20
Flying Car Novel about airplanes Phil High 5
Short day Novel about a day Phil Dog 30
10. Lab Assignment 5 (3/5)
As an alternative solution, you can create the an external table. The
external keyword lets you to create a table and provide a location so that
Hive does not use a default location for this table. This would be useful if
you already have data generated.
10
create external table
exbook(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't'
location '<file location, excluding the name of the file>';
5. Create a table called novel, containing those records from table book
that have keyword “novel” in their description and cache it in memory.
create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY')
as select * from book where description like "%Novel%";
11. Lab Assignment 5 (4/5)
6. Print out the list of available tables.
11
show tables;
select lastname, sum(cost) from book group by lastname;
7. Count the number of records from the table book.
select count(*) from book;
8. Print out the total cost of the books with authors who have the same last
name.
9. Count the number of distinct last names.
select count(distinct lastname) from book;
12. Lab Assignment 5 (5/5)
10. Drop the tables.
12
drop table book;
drop table novel;