This document provides instructions for completing a Hadoop tutorial assignment. It describes setting up a virtual machine with Hadoop, creating a project in Eclipse, and writing and running a simple MapReduce program. Completing the tutorial is optional but will earn students 5 points if submitted on time. The tutorial covers setting up the development environment, writing a word count MapReduce program, and debugging and running jobs in both standalone and pseudo-distributed modes.
You\'ve decided to make the switch to Drupal. Fantastic! Only one problem: you have to figure out how to move your content from the old database to Drupal. Although there are many import/export modules available it\'s sometimes good to know what\'s happening behind the scenes. This session will walk you through my adventures of porting community Web sites into Drupal.
The talk will include:
* exporting usable data from your old site;
* using CCK to create the right home for your new content;
* using existing import modules (specifically: import html, node import and user import); and
* importing content \"by hand\" using MySQL command line magic.
We will also touch on some of the headaches I ran into in keeping data synchronized on very active community sites during the development phase.
This session is perfect for people who are preparing to migrate their Web site to Drupal and also people who are new to database management but want to know more about how things work behind the scenes. Of course if you\'re already a pro at data migration, please bring your stories and suggestions!
Automatic deployment on .NET web stack (Minsk .NET meetup 12.02.14)Is Antipov
This document discusses automatic deployment of .NET applications. It describes building and packaging code, using transformations to manage configurations for different environments, deploying via tools like Web Deploy, updating databases, and triggering the process from a continuous integration server. The goal is to automate deployment and eliminate errors from manual processes.
This document provides information on improving Drupal performance through various techniques including performance testing, caching, optimizing database and server configurations, using tools like Varnish, load balancers, and CDNs, and addressing inefficient code. It also discusses a case study of using scalable cloud hosting and caching strategies to handle peak traffic for a site during major awards events.
Google cloud certified professional cloud developer practice dumps 2020SkillCertProExams
Google Cloud Certified - Professional Cloud Developer Exam Tests Questions PDF 2020
Full Practice Tests: https://skillcertpro.com/yourls/pcd
Unlike others, We offer details explanation to each and every questions that will help you to understand the question.
Practice Questions are taken from previous real time tests and are prepared by Industry Experts at Skillcertpro.
Our study material can be accessed online and 100% accessible to mobile devices.
100% money back guarantee (Unconditional, we assure that you will be satisfied with our services and pass the exam.
Do leave us a question we will happy to answer your queries.
BP204 It's Not Infernal: Dante's Nine Circles of XPages HeavenMichael McGarel
Do not abandon all hope, ye who enter here! Your very own Dante and Virgil will take you through a divine comedy of nine circles that show that XPages is more paradise than perdition. We'll show how XPages and related concepts like OSGi plugins make XPages a modern and vibrant development technology for web, mobile and rich client. On the way we'll guide you past some pitfalls to avoid becoming one of the lost souls. When we re-emerge, you'll see the sky's the limit with star-studded opportunities. (Presented at IBM Connect 2014)
Riga Dev Day - Automated Android Continuous IntegrationNicolas Fränkel
This document discusses setting up continuous integration for Android projects. It describes issues with dependencies like Gradle and Robolectric not working properly due to proxy restrictions. It proposes solutions like using a local Maven repository, configuring Gradle properties, and creating a custom Robolectric test runner and dependency resolver. It also addresses problems updating the Android SDK due to needing proxy authentication and license agreements. An Expect script is created to automate providing the credentials and agreeing to licenses during the SDK update process.
Every software project starts with the setup of a local development environment: a JDK, a preferred IDE and build tool, a local database and application server and so forth. Everything you and your team needs to be productive from day one. Time is valuable, so you take the quick route and reuse a development environment from a previous project. Broken windows from day one! With the first required changes things usually start to go wrong, the individual environments start to diverge and problems during the build or local execution of your software are inevitable. So how can you do better? The short answer is: with SEU-as-code, a lightweight approach and tool based on Gradle that helps to alleviate and automate the provisioning of developers. This session has been presented at the #JavaOne 2016 conference. (c) @LeanderReimer @qaware
Developing distributed analysis pipelines with shared community resources usi...Brad Chapman
The document describes using CloudBioLinux and CloudMan to develop distributed analysis pipelines with shared community resources. CloudBioLinux provides a shared machine image with bioinformatics software, while CloudMan enables setting up and managing compute clusters on Amazon Web Services, including sharing final results. An exome analysis pipeline is demonstrated running in parallel across the CloudMan cluster and extracting alignments, variant calls and summary information for reproducible research. Future interfaces are proposed to simplify file selection, parameters and integration with Galaxy for analysis.
You\'ve decided to make the switch to Drupal. Fantastic! Only one problem: you have to figure out how to move your content from the old database to Drupal. Although there are many import/export modules available it\'s sometimes good to know what\'s happening behind the scenes. This session will walk you through my adventures of porting community Web sites into Drupal.
The talk will include:
* exporting usable data from your old site;
* using CCK to create the right home for your new content;
* using existing import modules (specifically: import html, node import and user import); and
* importing content \"by hand\" using MySQL command line magic.
We will also touch on some of the headaches I ran into in keeping data synchronized on very active community sites during the development phase.
This session is perfect for people who are preparing to migrate their Web site to Drupal and also people who are new to database management but want to know more about how things work behind the scenes. Of course if you\'re already a pro at data migration, please bring your stories and suggestions!
Automatic deployment on .NET web stack (Minsk .NET meetup 12.02.14)Is Antipov
This document discusses automatic deployment of .NET applications. It describes building and packaging code, using transformations to manage configurations for different environments, deploying via tools like Web Deploy, updating databases, and triggering the process from a continuous integration server. The goal is to automate deployment and eliminate errors from manual processes.
This document provides information on improving Drupal performance through various techniques including performance testing, caching, optimizing database and server configurations, using tools like Varnish, load balancers, and CDNs, and addressing inefficient code. It also discusses a case study of using scalable cloud hosting and caching strategies to handle peak traffic for a site during major awards events.
Google cloud certified professional cloud developer practice dumps 2020SkillCertProExams
Google Cloud Certified - Professional Cloud Developer Exam Tests Questions PDF 2020
Full Practice Tests: https://skillcertpro.com/yourls/pcd
Unlike others, We offer details explanation to each and every questions that will help you to understand the question.
Practice Questions are taken from previous real time tests and are prepared by Industry Experts at Skillcertpro.
Our study material can be accessed online and 100% accessible to mobile devices.
100% money back guarantee (Unconditional, we assure that you will be satisfied with our services and pass the exam.
Do leave us a question we will happy to answer your queries.
BP204 It's Not Infernal: Dante's Nine Circles of XPages HeavenMichael McGarel
Do not abandon all hope, ye who enter here! Your very own Dante and Virgil will take you through a divine comedy of nine circles that show that XPages is more paradise than perdition. We'll show how XPages and related concepts like OSGi plugins make XPages a modern and vibrant development technology for web, mobile and rich client. On the way we'll guide you past some pitfalls to avoid becoming one of the lost souls. When we re-emerge, you'll see the sky's the limit with star-studded opportunities. (Presented at IBM Connect 2014)
Riga Dev Day - Automated Android Continuous IntegrationNicolas Fränkel
This document discusses setting up continuous integration for Android projects. It describes issues with dependencies like Gradle and Robolectric not working properly due to proxy restrictions. It proposes solutions like using a local Maven repository, configuring Gradle properties, and creating a custom Robolectric test runner and dependency resolver. It also addresses problems updating the Android SDK due to needing proxy authentication and license agreements. An Expect script is created to automate providing the credentials and agreeing to licenses during the SDK update process.
Every software project starts with the setup of a local development environment: a JDK, a preferred IDE and build tool, a local database and application server and so forth. Everything you and your team needs to be productive from day one. Time is valuable, so you take the quick route and reuse a development environment from a previous project. Broken windows from day one! With the first required changes things usually start to go wrong, the individual environments start to diverge and problems during the build or local execution of your software are inevitable. So how can you do better? The short answer is: with SEU-as-code, a lightweight approach and tool based on Gradle that helps to alleviate and automate the provisioning of developers. This session has been presented at the #JavaOne 2016 conference. (c) @LeanderReimer @qaware
Developing distributed analysis pipelines with shared community resources usi...Brad Chapman
The document describes using CloudBioLinux and CloudMan to develop distributed analysis pipelines with shared community resources. CloudBioLinux provides a shared machine image with bioinformatics software, while CloudMan enables setting up and managing compute clusters on Amazon Web Services, including sharing final results. An exome analysis pipeline is demonstrated running in parallel across the CloudMan cluster and extracting alignments, variant calls and summary information for reproducible research. Future interfaces are proposed to simplify file selection, parameters and integration with Galaxy for analysis.
In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages.
The document discusses how agile methodologies like Scrum can be used to implement the goals and practices of the CMMI model. It presents an overview of Scrum and the BoldDelivery agile methodology. Specific agile ceremonies and artifacts like sprint planning, daily standups, and product backlogs are described. The document maps these agile elements to various process areas, generic goals, and specific goals of the CMMI model to demonstrate how agile can fulfill the "spirit" of CMMI through its flexible "form". It is concluded that agile and CMMI are compatible and that additional ceremonies or artifacts may be needed but implementing the model in an agile way can still be lightweight.
This document compares Agile methodologies and the Capability Maturity Model Integration (CMMI) framework. It discusses how adopting CMMI can help improve an organization's marketability while still maintaining an Agile approach. Specifically, it argues that CMMI and Agile are not in fundamental conflict and that CMMI goals and practices can be satisfied through an Agile methodology like BoldDelivery that focuses on iterative delivery and statistical process management. Examples are provided of how Agile metrics and practices align with CMMI requirements.
The history of music videos began in 1892 when George H. Thomas combined a series of images with lyrics to accompany a song, known as an illustrated song. In the 1930s-1950s, musical films became popular and included musical numbers. Michael Jackson's "Thriller" video in 1982 was groundbreaking as it advanced beyond just a performance and told a short film story. The rise of MTV in the 1980s brought music videos into the mainstream as the first music television channel.
Eduardo Salazar-Garcia has worked in upholstery, retail sales, and car rental over the past 6 years. He started as an upholstery apprentice in 2012 and worked his way up to master upholster. From 2013-2014, he held various sales roles at Levis Strauss including sales representative and lead. Since 2012, he has also taken on increasing responsibilities at Avis Rent a Car, currently serving as their preferred rental agent. His resume lists skills in computer programs, languages, customer service, and leadership. He provides 6 references from past supervisors who can speak to his work over 1-3 years.
JS Lab`16. Роман Якобчук: "React в реальной жизни"GeeksLab Odessa
12.3.16. JS Lab
В прошлом году я много рассказывал о возможностях Реакта во всяких rocket-science проектах. В этот раз я поделюсь с вами практикой использования этой библиотеки в повседневной жизни. Я объясню, почему раз за разом для нового проекта я выбираю Реакт, и какую пользу это приносит моей компании. Вы увидите весь спектр его применения и услышите, когда от него лучше отказаться.
El documento presenta estadísticas sobre el número de visitantes a un museo según mes, edad, procedencia y día de la semana. En enero hubo 991 visitantes, en febrero 1000, y los meses con mayor afluencia fueron mayo y julio con más de 1500 visitantes. La mayoría de visitantes tenían entre 26 y 40 años. La mayor proporción procedía de Lugo ciudad y provincia. Los días con más afluencia fueron miércoles, jueves y viernes.
El documento habla sobre cinco tipos de árboles: abedul, roble, pino, haya y se incluye un video. Describe brevemente el roble como un árbol del género Quercus nativo del hemisferio norte, el pino como un género de plantas vasculares comúnmente llamadas pinos pertenecientes a las coníferas, y el haya como un género de diez especies de árboles caducos de la familia Fagaceae nativos de zonas templadas.
How Google is Reading and Indexing Content in 2016Greenlane
This is the presentation given by Bill Sebald at the Digital Marketing Suburbia Meetup #1 - https://www.meetup.com/Digital-Marketing-Suburbia/
This presentation focuses on Google's ability to read and understand website content and the intent of queries.
JS Lab`16. Александр Осин: "Задачи которые мы решаем и технологии которые нам...GeeksLab Odessa
12.3.16 JS Lab.
Upcoming events: goo.gl/I2gJ4H
Каждый день мы становимся свидетелями появления новых передовых технологий и новых стандартов в разработке веб приложений. Сообщество развивается настолько быстро, что у нас хватает времени, буквально, на внедрение технологии в реальный продукт, после чего ее затмевает новая, более перспективная и интересная. Участвуя в этой гонке мы порой забываем для чего все это задумывалось - какие задачи нам необходимо решить, чем фреймворк Х может нам помочь и почему несмотря на такой прогресс софт все еще сложно поддерживать.
Batubara terbentuk dari sisa-sisa tumbuhan yang terendam di bawah air selama jutaan tahun. Beberapa faktor seperti iklim, topografi, dan aktivitas tektonik mempengaruhi proses pembentukan dan kualitas batubara. Jenis-jenis batubara bervariasi mulai dari lignit hingga antrasit berdasarkan kandungan karbon dan proses metamorfosisnya."
Webinar: Data Classification - Closing the Gap between Enterprise and SAP DataUL Transaction Security
In this interactive partner webinar, security experts from Boldon James and SECUDE talk about harnessing the power of data classification for your enterprise and SAP data in the most user-friendly and efficient way possible.
Complete mapping with CMMI v1.3 and Agile Scrum practices. Easy interpretation of cmmi practices and how to apply in agile scrum lifecycles. CMMI Development maturity level 3 practices are mapped with agile scrum. Simpler and quick reference guide for practioners.
This document provides instructions for an assignment on MapReduce with Hadoop. It includes setting up the environment, writing a WordCount program as a first example, and estimating Euler's constant using a Monte Carlo method as a second example. The document asks multiple questions at each step to help the reader understand and complete the tasks. It covers downloading and configuring Hadoop, writing Map and Reduce functions, compiling and running jobs on a Hadoop cluster, and analyzing the results.
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
Want a quick tour of Apache Hadoop and InfoSphere BigInsights (IBM's Hadoop distribution)? Follow this self-study lab to get hands-on experience with HDFS, MapReduce jobs, BigSheets, Big SQL, and more. This lab was tested against the free BigInsights Quick Start Edition 3.0 VMware image.
1) The document outlines the tasks, tools, and topics explored by Vipul Divyanshu during a summer internship at India Innovation Labs, including data analytics on a medium-sized database and building a recommender engine.
2) Key tools explored include Mahout for machine learning algorithms, Hadoop for distributed processing, and Rush Analyzer (with KNIME) for data visualization and analytics.
3) Vipul implemented recommendation engines including user-based, item-based, and SlopeOne recommenders and evaluated performance using recommender evaluators.
The document provides step-by-step instructions for installing Hortonworks Hadoop 1.3 on a single node Windows cluster using Hyper-V. It includes downloading required files, installing pre-requisites like Python, .NET Framework, JDK, and configuring environment variables. The steps also cover starting Hadoop services, running smoke tests and accessing the Hadoop user interfaces. Common issues addressed are configuring the hosts file, opening ports, and manually starting services that do not auto-start.
This document discusses setting up a local development environment for Drupal. It covers installing and configuring XAMPP, a local web server package, downloading and installing Drupal, and installing useful development tools like Git, Drush, and Sass. XAMPP is used to create a local server for testing Drupal sites without needing a live server. Drupal is downloaded and its installation wizard is used to set up a new Drupal site. Git is installed for version control and Drush provides commands for common Drupal tasks from the command line. Sass is also installed to allow writing CSS in a more reusable, object-oriented way.
Learn how to use Continuous Delivery for Puppet Enterprise (CD4PE) in an interactive workshop with hands-on labs. What's CD4PE? CD4PE is the continuous delivery add-on to Puppet Enterprise, aimed at accelerating the speed at which you can get Puppet code changes deployed into production safely. CD4PE facilitates code collaboration across teams, and dramatically improves the release management process for teams that own & maintain individual Puppet modules. CD4PE integrates with both Puppet Enterprise as well as your version control system of choice.
After completing the workshop, you will be able to use CD4PE to perform common code management tasks on your Puppet control repo and modules.
In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages.
The document discusses how agile methodologies like Scrum can be used to implement the goals and practices of the CMMI model. It presents an overview of Scrum and the BoldDelivery agile methodology. Specific agile ceremonies and artifacts like sprint planning, daily standups, and product backlogs are described. The document maps these agile elements to various process areas, generic goals, and specific goals of the CMMI model to demonstrate how agile can fulfill the "spirit" of CMMI through its flexible "form". It is concluded that agile and CMMI are compatible and that additional ceremonies or artifacts may be needed but implementing the model in an agile way can still be lightweight.
This document compares Agile methodologies and the Capability Maturity Model Integration (CMMI) framework. It discusses how adopting CMMI can help improve an organization's marketability while still maintaining an Agile approach. Specifically, it argues that CMMI and Agile are not in fundamental conflict and that CMMI goals and practices can be satisfied through an Agile methodology like BoldDelivery that focuses on iterative delivery and statistical process management. Examples are provided of how Agile metrics and practices align with CMMI requirements.
The history of music videos began in 1892 when George H. Thomas combined a series of images with lyrics to accompany a song, known as an illustrated song. In the 1930s-1950s, musical films became popular and included musical numbers. Michael Jackson's "Thriller" video in 1982 was groundbreaking as it advanced beyond just a performance and told a short film story. The rise of MTV in the 1980s brought music videos into the mainstream as the first music television channel.
Eduardo Salazar-Garcia has worked in upholstery, retail sales, and car rental over the past 6 years. He started as an upholstery apprentice in 2012 and worked his way up to master upholster. From 2013-2014, he held various sales roles at Levis Strauss including sales representative and lead. Since 2012, he has also taken on increasing responsibilities at Avis Rent a Car, currently serving as their preferred rental agent. His resume lists skills in computer programs, languages, customer service, and leadership. He provides 6 references from past supervisors who can speak to his work over 1-3 years.
JS Lab`16. Роман Якобчук: "React в реальной жизни"GeeksLab Odessa
12.3.16. JS Lab
В прошлом году я много рассказывал о возможностях Реакта во всяких rocket-science проектах. В этот раз я поделюсь с вами практикой использования этой библиотеки в повседневной жизни. Я объясню, почему раз за разом для нового проекта я выбираю Реакт, и какую пользу это приносит моей компании. Вы увидите весь спектр его применения и услышите, когда от него лучше отказаться.
El documento presenta estadísticas sobre el número de visitantes a un museo según mes, edad, procedencia y día de la semana. En enero hubo 991 visitantes, en febrero 1000, y los meses con mayor afluencia fueron mayo y julio con más de 1500 visitantes. La mayoría de visitantes tenían entre 26 y 40 años. La mayor proporción procedía de Lugo ciudad y provincia. Los días con más afluencia fueron miércoles, jueves y viernes.
El documento habla sobre cinco tipos de árboles: abedul, roble, pino, haya y se incluye un video. Describe brevemente el roble como un árbol del género Quercus nativo del hemisferio norte, el pino como un género de plantas vasculares comúnmente llamadas pinos pertenecientes a las coníferas, y el haya como un género de diez especies de árboles caducos de la familia Fagaceae nativos de zonas templadas.
How Google is Reading and Indexing Content in 2016Greenlane
This is the presentation given by Bill Sebald at the Digital Marketing Suburbia Meetup #1 - https://www.meetup.com/Digital-Marketing-Suburbia/
This presentation focuses on Google's ability to read and understand website content and the intent of queries.
JS Lab`16. Александр Осин: "Задачи которые мы решаем и технологии которые нам...GeeksLab Odessa
12.3.16 JS Lab.
Upcoming events: goo.gl/I2gJ4H
Каждый день мы становимся свидетелями появления новых передовых технологий и новых стандартов в разработке веб приложений. Сообщество развивается настолько быстро, что у нас хватает времени, буквально, на внедрение технологии в реальный продукт, после чего ее затмевает новая, более перспективная и интересная. Участвуя в этой гонке мы порой забываем для чего все это задумывалось - какие задачи нам необходимо решить, чем фреймворк Х может нам помочь и почему несмотря на такой прогресс софт все еще сложно поддерживать.
Batubara terbentuk dari sisa-sisa tumbuhan yang terendam di bawah air selama jutaan tahun. Beberapa faktor seperti iklim, topografi, dan aktivitas tektonik mempengaruhi proses pembentukan dan kualitas batubara. Jenis-jenis batubara bervariasi mulai dari lignit hingga antrasit berdasarkan kandungan karbon dan proses metamorfosisnya."
Webinar: Data Classification - Closing the Gap between Enterprise and SAP DataUL Transaction Security
In this interactive partner webinar, security experts from Boldon James and SECUDE talk about harnessing the power of data classification for your enterprise and SAP data in the most user-friendly and efficient way possible.
Complete mapping with CMMI v1.3 and Agile Scrum practices. Easy interpretation of cmmi practices and how to apply in agile scrum lifecycles. CMMI Development maturity level 3 practices are mapped with agile scrum. Simpler and quick reference guide for practioners.
This document provides instructions for an assignment on MapReduce with Hadoop. It includes setting up the environment, writing a WordCount program as a first example, and estimating Euler's constant using a Monte Carlo method as a second example. The document asks multiple questions at each step to help the reader understand and complete the tasks. It covers downloading and configuring Hadoop, writing Map and Reduce functions, compiling and running jobs on a Hadoop cluster, and analyzing the results.
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
Want a quick tour of Apache Hadoop and InfoSphere BigInsights (IBM's Hadoop distribution)? Follow this self-study lab to get hands-on experience with HDFS, MapReduce jobs, BigSheets, Big SQL, and more. This lab was tested against the free BigInsights Quick Start Edition 3.0 VMware image.
1) The document outlines the tasks, tools, and topics explored by Vipul Divyanshu during a summer internship at India Innovation Labs, including data analytics on a medium-sized database and building a recommender engine.
2) Key tools explored include Mahout for machine learning algorithms, Hadoop for distributed processing, and Rush Analyzer (with KNIME) for data visualization and analytics.
3) Vipul implemented recommendation engines including user-based, item-based, and SlopeOne recommenders and evaluated performance using recommender evaluators.
The document provides step-by-step instructions for installing Hortonworks Hadoop 1.3 on a single node Windows cluster using Hyper-V. It includes downloading required files, installing pre-requisites like Python, .NET Framework, JDK, and configuring environment variables. The steps also cover starting Hadoop services, running smoke tests and accessing the Hadoop user interfaces. Common issues addressed are configuring the hosts file, opening ports, and manually starting services that do not auto-start.
This document discusses setting up a local development environment for Drupal. It covers installing and configuring XAMPP, a local web server package, downloading and installing Drupal, and installing useful development tools like Git, Drush, and Sass. XAMPP is used to create a local server for testing Drupal sites without needing a live server. Drupal is downloaded and its installation wizard is used to set up a new Drupal site. Git is installed for version control and Drush provides commands for common Drupal tasks from the command line. Sass is also installed to allow writing CSS in a more reusable, object-oriented way.
Learn how to use Continuous Delivery for Puppet Enterprise (CD4PE) in an interactive workshop with hands-on labs. What's CD4PE? CD4PE is the continuous delivery add-on to Puppet Enterprise, aimed at accelerating the speed at which you can get Puppet code changes deployed into production safely. CD4PE facilitates code collaboration across teams, and dramatically improves the release management process for teams that own & maintain individual Puppet modules. CD4PE integrates with both Puppet Enterprise as well as your version control system of choice.
After completing the workshop, you will be able to use CD4PE to perform common code management tasks on your Puppet control repo and modules.
BLCN532 Lab 1Set up your development environmentV2.0.docxmoirarandell
BLCN532 Lab 1
Set up your development environment
V2.0
Introduction
This course introduces students to blockchain development for enterprise environments. Before you can develop software applications, you need to ensue your development environment is in place. That means you’ll need all the tools and infrastructure installed and configured to support enterprise blockchain software development projects.
In this lab you’ll set up your own Hyperledger Fabric development environment and install the course software from the textbook. When you finish this lab, you’ll have a working development environment and will be ready to start running and modifying blockchain applications.
The instructions in your textbook are for Mac and Linux computers.
However
, there is no guarantee that your installation of MacOS or Linux is completely compatible with the environment in which the commands from the textbook work properly. For that reason, I
STRONGLY SUGGEST
that you acquire an Ubuntu 16.04 Virtual Machine (VM) for your labs. Using an Ubuntu 16.04 VM will make the labs far easier to complete.
The instructions in this course’s labs assume that your computer runs the Windows operating system. If you run MacOS or Linux, you can get
Vagrant
and
VirtualBox
for those operating systems and follow the gist of the “Initial setup for Windows computers”.
Lab Deliverables:
To complete this lab, you must create a
Lab Report file
and submit the file in iLearn. The Lab Report file must be a Microsoft Word format (.docx), and have the filename with the following format:
BLCN532_SECTION_STUDENTID_LASTNAME_FIRSTNAME_Lab01.docx
· SECTION is the section number of your current course (2 digits)
· STUDENTID is your student ID number (with leading zeros)
· LASTNAME is your last name, FIRSTNAME is your first name
To get started, create a Microsoft Word document (.docx) with the correct filename for this lab. You’ll be asked to enter text and paste screenshots into the lab report file.
NOTE: All screenshots MUST be readable. Use the Ubuntu Screen Capture utility (see the lab video.) Make sure that you label each screenshot (i.e. Step 2.1.3) and provide screenshots in order. For commands that produce lots of output, I only want to see the last full screen when the command finishes. Provide FULL screenshots, NOT cropped images.
SECTION 1: Initial setup for Windows computers (Chapter 3)
Step 1.1: Install Oracle Virtualbox (Windows, Linux, MacOS)
Oracle Virtualbox is an open source virtualization environment that allows you to run multiple virtual machines and containers on a single personal computer. Virtualbox is free and it is easy to install.
In your favorite web browser, navigate to:
https://www.virtualbox.org/
and click the “Download Virtualbox” button. Click the “Windows hosts” link to download the main installation executable. You should also click the “All supported platforms” under the “Extension Pack” heading to download extra software supp.
This document provides instructions for installing Apache, MySQL, PHP, and Perl on Windows. It details downloading and installing each application, including accepting defaults, modifying directories during installation, and testing functionality. Configuration steps are also described, such as editing files to integrate Apache and PHP and enable MySQL support in PHP. The summary focuses on the core installation and configuration process across the applications.
The document provides step-by-step instructions for installing Apache, MySQL, PHP, and Perl on Windows. It details downloading and installing each program, including configuring settings and paths. Basic tests are outlined to check the installation of each component. Finally, a PHP script is provided that connects to a MySQL database to test the full setup.
The document provides step-by-step instructions for installing Apache, MySQL, PHP, and Perl on Windows. It details downloading and installing each program, including configuring settings and paths. Basic tests are outlined to check the installation of each component. Finally, a PHP script is presented that connects to a MySQL database to test the full installation.
The document provides step-by-step instructions for installing Apache, MySQL, PHP, and Perl on Windows. It details downloading and installing each program, including configuring settings and paths. Basic tests are outlined to check the installation of each component. Finally, a PHP script is presented that connects to a MySQL database to test the full installation.
Cloud and Ubiquitous Computing manual Sonali Parab
This manual consist of cloud and Ubiquitous Computing practicals of the following topics:
1.Implement Windows / Linux Cluster,
2.Developing application for Windows Azure,
3.Implementing private cloud with Xen Server,
4.Implement Hadoop,
5.Develop application using GAE,
6.Implement VMWAre ESXi Server,
7.Native Virtualization using Hyper V,
8.Using OpenNebula to manage heterogeneous distributed data center infrastructures.
Short and comprehensive manual to extend your local matlab with a high performance computing cluster of NVidia tesla's 2070 graphical processing units.
Oracle Developers APAC Meetup #1 - Working with Wercker WorksheetsDarrel Chia
Oracle Developers APAC Meetup #1 - Working with Wercker
This is the hands-on work exercise example used for the Oracle APAC Developers Meetup #1 held in Singapore on 7th February 2018.
The worksheets are intended to be used in conjunction with the slides.
You can find the slides at :
https://www.slideshare.net/DarrelChia1/oracle-apac-developers-meetup-1-working-with-wercker-slides
Meetup Site:
https://www.meetup.com/Oracle-Developers-APAC/events/247111220/
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
Cocoa tip for Cocoaheads Shanghai February 2016Antoine Cœur
This document summarizes the changes between CocoaPods version 0.39.0 and the upcoming 1.0.0 release. It outlines breaking changes to the Podfile format and new features like root options and deduplication. The document also previews an explanation of deduplication but notes it will not be fully explained until subsequent slides.
Installation and setup hadoop publishedDipendra Kusi
The document provides steps to install Hadoop using a virtual machine and Cloudera quickstart VM, configure Hadoop, and run a sample word count program on Hadoop to count the words in a file. It describes downloading required software, starting the Cloudera VM, checking the Hadoop installation, obtaining sample code and input data, executing the word count jar on Hadoop to process the input file and generate output, and creating a custom word count Java program, compiling it to a JAR file and running it on Hadoop.
This document discusses deploying and researching Hadoop in virtual machines. It provides definitions of Hadoop, MapReduce, and HDFS. It describes using CloudStack to deploy a Hadoop cluster across multiple virtual machines to enable distributed and parallel processing of large datasets. The proposed system is to deploy Hadoop applications on virtual machines from a CloudStack infrastructure for improved performance, reliability and reduced power consumption compared to a single virtual machine. It outlines the hardware, software, architecture, design, testing and outputs of the proposed system.
This document discusses deploying and researching Hadoop in virtual machines. It provides definitions of Hadoop, MapReduce, and HDFS. It describes using CloudStack to deploy a Hadoop cluster across multiple virtual machines to enable distributed and parallel processing of large datasets. The proposed system is to deploy Hadoop applications on virtual machines from a CloudStack infrastructure for improved performance, reliability and reduced power consumption compared to a single virtual machine. It outlines the hardware, software, architecture, design, testing and outputs of the proposed system.
The document provides step-by-step instructions for importing drivers and creating task sequences in MDT (Microsoft Deployment Toolkit) to automate imaging computers. It describes downloading driver packs from manufacturer websites, extracting and importing the drivers into the correct folders on the MDT server. It also explains creating selection profiles to specify which drivers go to which task sequence, injecting drivers into task sequences, and creating task sequences for specific models. The document is a guide for setting up and customizing MDT to automate imaging computers with the appropriate drivers and applications.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfUndress Baby
The quest for the best AI face swap solution is marked by an amalgamation of technological prowess and artistic finesse, where cutting-edge algorithms seamlessly replace faces in images or videos with striking realism. Leveraging advanced deep learning techniques, the best AI face swap tools meticulously analyze facial features, lighting conditions, and expressions to execute flawless transformations, ensuring natural-looking results that blur the line between reality and illusion, captivating users with their ingenuity and sophistication.
Web:- https://undressbaby.com/
Energy consumption of Database Management - Florina Jonuzi
Hadoop Tutorial
1. CS246: Mining Massive Datasets Winter 2016
Hadoop Tutorial
Due 11:59pm January 12, 2016
General Instructions
The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you
acquainted with the code and homework submission system. Completing the tutorial is
optional but by handing in the results in time students will earn 5 points. This tutorial is
to be completed individually.
Here you will learn how to write, compile, debug and execute a simple Hadoop program.
First part of the assignment serves as a tutorial and the second part asks you to write your
own Hadoop program.
Section 1 describes the virtual machine environment. Instead of the virtual machine, you
are welcome to setup your own pseudo-distributed or fully distributed cluster if you pre-
fer. Any version of Hadoop that is at least 1.0 will suffice. (For an easy way to set up a
cluster, try Cloudera Manager: http://archive.cloudera.com/cm5/installer/latest/
cloudera-manager-installer.bin.) If you choose to setup your own cluster, you are re-
sponsible for making sure the cluster is working properly. The TAs will be unable to help
you debug configuration issues in your own cluster.
Section 2 explains how to use the Eclipse environment in the virtual machine, including how
to create a project, how to run jobs, and how to debug jobs. Section 2.5 gives an end-to-end
example of creating a project, adding code, building, running, and debugging it.
Section 3 is the actual homework assignment. There are no deliverable for sections 1 and 2.
In section 3, you are asked to write and submit your own MapReduce job
This assignment requires you to upload the code and hand-in the output for Section 3.
All students should submit the output via GradeScope and upload the code via snap.
GradeScope: To register for GradeScope,
• Create an account on GradeScope if you don’t have one already.
• Join CS246 course using Entry Code 92B7E9
Upload the code: Put all the code for a single question into a single file and upload it at
http://snap.stanford.edu/submit/.
2. CS246: Mining Massive Datasets - Problem Set 0 2
Questions
1 Setting up a virtual machine
• Download and install VirtualBox on your machine: http://virtualbox.org/wiki/
Downloads
• Download the Cloudera Quickstart VM at https://downloads.cloudera.com/demo_
vm/virtualbox/cloudera-quickstart-vm-5.5.0-0-virtualbox.zip.
• Uncompress the VM archive. It is compressed with 7-zip. If needed you can download
a tool to uncompress the archive at http://www.7-zip.org/.
• Start VirtualBox and click Import Appliance in the File dropdown menu. Click the
folder icon beside the location field. Browse to the uncompressed archive folder, select
the .ovf file, and click the Open button. Click the Continue button. Click the Import
button.
• Your virtual machine should now appear in the left column. Select it and click on Start
to launch it.
• To verify that the VM is running and you can access it, open a browser to the URL:
http://localhost:8088. You should see the resource manager UI. The VM uses port
forwarding for the common Hadoop ports, so when the VM is running, those ports on
localhost will redirect to the VM.
• Optional: Open the Virtual Box preferences (File → Preferences → Network) and
select the Adapter 2 tab. Click the Enable Network Adapter checkbox. Select Host-
only Adapter. If the list of networks is empty, add a new network. Click OK. If you
do this step, you will be able to connect to the running virtual machine via SSH from
the host OS at 192.168.56.101. The username and password are ’cloudera’.
The virtual machine includes the following software
• CentOS 6.4
• JDK 7 (1.7.0 67)
• Hadoop 2.5.0
• Eclipse 4.2.6 (Juno)
The virtual machine runs best with 4096MB of RAM, but has been tested to
function with 1024MB. Note that at 1024MB, while it did technically function,
it was very slow to start up.
3. CS246: Mining Massive Datasets - Problem Set 0 3
2 Running Hadoop jobs
Generally Hadoop can be run in three modes.
1. Standalone (or local) mode: There are no daemons used in this mode. Hadoop
uses the local file system as an substitute for HDFS file system. The jobs will run as
if there is 1 mapper and 1 reducer.
2. Pseudo-distributed mode: All the daemons run on a single machine and this setting
mimics the behavior of a cluster. All the daemons run on your machine locally using
the HDFS protocol. There can be multiple mappers and reducers.
3. Fully-distributed mode: This is how Hadoop runs on a real cluster.
In this homework we will show you how to run Hadoop jobs in Standalone mode (very useful
for developing and debugging) and also in Pseudo-distributed mode (to mimic the behavior
of a cluster environment).
2.1 Creating a Hadoop project in Eclipse
(There is a plugin for Eclipse that makes it simple to create a new Hadoop project and
execute Hadoop jobs, but the plugin is only well maintained for Hadoop 1.0.4, which
is a rather old version of Hadoop. There is a project at https://github.com/winghc/
hadoop2x-eclipse-plugin that is working to update the plugin for Hadoop 2.0. You can
try it out if you like, but your milage may vary.)
To create a project:
1. Open Eclipse. If you just launched the VM, you may have to close the Firefox window
to find the Eclipse icon on the desktop.
2. Right-click on the training node in the Package Explorer and select Copy. See Figure
1.
4. CS246: Mining Massive Datasets - Problem Set 0 4
Figure 1: Create a Hadoop Project.
3. Right-click on the training node in the Package Explorer and select Paste . See Figure
2.
Figure 2: Create a Hadoop Project.
4. In the pop-up dialog, enter the new project name in the Project Name field and click
OK. See Figure 3.
5. CS246: Mining Massive Datasets - Problem Set 0 5
Figure 3: Create a Hadoop Project.
5. Modify or replace the stub classes found in the src directory as needed.
2.2 Running Hadoop jobs in standalone mode
Once you’ve created your project and written the source code, to run the project in stand-
alone mode, do the following:
1. Right-click on the project and select Run As → Run Configurations. See Figure 4.
Figure 4: Run a Hadoop Project.
6. CS246: Mining Massive Datasets - Problem Set 0 6
2. In the pop-up dialog, select the Java Application node and click the New launch con-
figuration button in the upper left corner. See Figure 5.
Figure 5: Run a Hadoop Project.
3. Enter a name in the Name field and the name of the main class in the Main class field.
See Figure 6.
Figure 6: Run a Hadoop Project.
4. Switch to the Arguments tab and input the required arguments. Click Apply. See
Figure 7. To run the job immediately, click on the Run button. Otherwise click Close
and complete the following step.
7. CS246: Mining Massive Datasets - Problem Set 0 7
Figure 7: Run a Hadoop Project.
5. Right-click on the project and select Run As → Java Application. See Figure 8.
Figure 8: Run a Hadoop Project.
6. In the pop-up dialog select the main class from the selection list and click OK. See
Figure 9.
8. CS246: Mining Massive Datasets - Problem Set 0 8
Figure 9: Run a Hadoop Project.
After you have setup the run configuration the first time, you can skip steps 1 and
2 above in subsequent runs, unless you need to change the arguments. You can also
create more than one launch configuration if you’d like, such as one for each set of
common arguments.
2.3 Running Hadoop in pseudo-distributed mode
Once you’ve created your project and written the source code, to run the project in pseudo-
distributed mode, do the following:
1. Right-click on the project and select Export. See Figure 10.
9. CS246: Mining Massive Datasets - Problem Set 0 9
Figure 10: Run a Hadoop Project.
2. In the pop-up dialog, expand the Java node and select JAR file. See Figure 11. Click
Next >
10. CS246: Mining Massive Datasets - Problem Set 0 10
Figure 11: Run a Hadoop Project.
3. Enter a path in the JAR file field and click Finish. See Figure 12.
11. CS246: Mining Massive Datasets - Problem Set 0 11
Figure 12: Run a Hadoop Project.
4. Open a terminal and run the following command:
hadoop jar path/to/file.jar input path output path
After modifications to the source files, repeat all of the above steps to run job again.
2.4 Debugging Hadoop jobs
To debug an issue with a job, the easiest approach is to run the job in stand-alone mode
and use a debugger. To debug your job, do the following steps:
1. Right-click on the project and select Debug As → Java Application. See Figure 13.
12. CS246: Mining Massive Datasets - Problem Set 0 12
Figure 13: Debug a Hadoop project.
2. In the pop-up dialog select the main class from the selection list and click OK. See
Figure 14.
Figure 14: Run a Hadoop Project.
13. CS246: Mining Massive Datasets - Problem Set 0 13
You can use the Eclipse debugging features to debug your job execution. See the additional
Eclipse tutorials at the end of section 2.6 for help using the Eclipse debugger.
When running your job in pseudo-distributed mode, the output from the job is logged in the
task tracker’s log files, which can be accessed most easily by pointing a web browser to port
8088 of the server, which will the localhost. From the job tracker web page, you can drill
down into the failing job, the failing task, the failed attempt, and finally the log files. Note
that the logs for stdout and stderr are separated, which can be useful when trying to isolate
specific debugging print statements.
2.5 Example project
In this section you will create a new Eclipse Hadoop project, compile, and execute it. The
program will count the frequency of all the words in a given large text file. In your virtual
machine, Hadoop, Java environment and Eclipse have already been pre-installed.
• Open Eclipse. If you just launched the VM, you may have to close the Firefox window
to find the Eclipse icon on the desktop.
• Right-click on the training node in the Package Explorer and select Copy. See Figure
15.
Figure 15: Create a Hadoop Project.
• Right-click on the training node in the Package Explorer and select Paste. See Figure
16.
14. CS246: Mining Massive Datasets - Problem Set 0 14
Figure 16: Create a Hadoop Project.
• In the pop-up dialog, enter the new project name in the Project Name field and click
OK. See Figure 17.
Figure 17: Create a Hadoop Project.
• Create a new package called edu.stanford.cs246.wordcount by right-clicking on the
src node and selecting New → Package. See Figure 18.
15. CS246: Mining Massive Datasets - Problem Set 0 15
Figure 18: Create a Hadoop Project.
• Enter edu.stanford.cs246.wordcount in the Name field and click Finish. See Figure
19.
Figure 19: Create a Hadoop Project.
• Create a new class in that package called WordCount by right-clicking on the edu.stanford.cs246.wordco
node and selecting New → Class. See Figure 20.
16. CS246: Mining Massive Datasets - Problem Set 0 16
Figure 20: Create a Hadoop Project.
• In the pop-up dialog, enter WordCount as the Name. See Figure 21.
Figure 21: Create a Hadoop Project.
• In the Superclass field, enter Configured and click the Browse button. From the popup
17. CS246: Mining Massive Datasets - Problem Set 0 17
window select Configured − org.apache.hadoop.conf and click the OK button. See
Figure 22.
Figure 22: Create a java file.
• In the Interfaces section, click the Add button. From the pop-up window select Tool −
org.apache.hadoop.util and click the OK button. See Figure 23.
18. CS246: Mining Massive Datasets - Problem Set 0 18
Figure 23: Create a java file.
• Check the boxes for public static void main(String args[]) and Inherited abstract meth-
ods and click the Finish button. See Figure 24.
19. CS246: Mining Massive Datasets - Problem Set 0 19
Figure 24: Create WordCount.java.
• You will now have a rough skeleton of a Java file as in Figure 25. You can now add
code to this class to implement your Hadoop job.
Figure 25: Create WordCount.java.
• Rather than implement a job from scratch, copy the contents from http://snap.
stanford.edu/class/cs246-data-2014/WordCount.java and paste it into the WordCount.java
20. CS246: Mining Massive Datasets - Problem Set 0 20
file. See Figure 26. The code in WordCount.java calculates the frequency of each word
in a given dataset.
Figure 26: Create WordCount.java.
• Download the Complete Works of William Shakespeare from Project Gutenberg at
http://www.gutenberg.org/cache/epub/100/pg100.txt. You can do this simply
with cURL, but you also have to be aware of the byte order mark (BOM). You can
download the file and remove the BOM in one line by opening a terminal, changing to
the ~/workspace/WordCount directory, and running the following command:
curl http://www.gutenberg.org/cache/epub/100/pg100.txt | perl -pe ’s/^xEFxBB
xBF//’ > pg100.txt
If you copy the above command beware the quotes as the copy/paste will likely mis-
translate them.
• Right-click on the project and select Run As → Run Configurations. See Figure 27.
21. CS246: Mining Massive Datasets - Problem Set 0 21
Figure 27: Run WordCount.java.
• In the pop-up dialog, select the Java Application node and click the New launch con-
figuration button in the upper left corner. See Figure 28.
Figure 28: Run WordCount.java.
• Enter a name in the Name field and WordCount in the Main class field. See Figure 29.
22. CS246: Mining Massive Datasets - Problem Set 0 22
Figure 29: Run WordCount.java.
• Switch to the Arguments tab and put pg100.txt output in the Program arguments
field. See Figure 30. Click Apply and Close.
Figure 30: Run WordCount.java.
• Right-click on the project and select Run As → Java Application. See Figure 31.
23. CS246: Mining Massive Datasets - Problem Set 0 23
Figure 31: Run WordCount.java.
• In the pop-up dialog select WordCount - edu.stanford.cs246.wordcount from the selec-
tion list and click OK. See Figure 32.
Figure 32: Export a hadoop project.
24. CS246: Mining Massive Datasets - Problem Set 0 24
You will see the command output in the console window, and if the job succeeds,
you’ll find the results in the ~/workspace/WordCount/output directory. If the job
fails complaining that it cannot find the input file, make sure that the pg100.txt file
is located in the ~/workspace/WordCount directory.
• Right-click on the project and select Export. See Figure 33.
Figure 33: Run WordCount.java.
• In the pop-up dialog, expand the Java node and select JAR file. See Figure 34. Click
Next >
25. CS246: Mining Massive Datasets - Problem Set 0 25
Figure 34: Export a hadoop project.
• Enter /home/cloudera/wordcount.jar in the JAR file field and click Finish. See
Figure 35.
26. CS246: Mining Massive Datasets - Problem Set 0 26
Figure 35: Export a hadoop project.
If you see an error dialog warning that the project compiled with warnings, you can
simply click OK.
• Open a terminal in your VM and traverse to the folder /home/cloudera and run the
following commands:
hadoop fs -put workspace/WordCount/pg100.txt
hadoop jar WordCount.jar edu.stanford.cs246.wordcount.WordCount pg100.txt
output
• Run the command: hadoop fs -ls output
You should see an output file for each reducer. Since there was only one reducer for
this job, you should only see one part-* file. Note that sometimes the files will be
called part-NNNNN, and sometimes they’ll be called part-r-NNNNN. See Figure 36.
27. CS246: Mining Massive Datasets - Problem Set 0 27
Figure 36: Run WordCount job.
• Run the command:
hadoop fs -cat output/part* | head
You should see the same output as when you ran the job locally, as shown in Figure
37
Figure 37: Run WordCount job.
• To view the job’s logs, open the browser in the VM and point it to http://localhost:
8088 as in Figure 38
Figure 38: Run WordCount job.
• Click on the link for the completed job. See Figure 39.
28. CS246: Mining Massive Datasets - Problem Set 0 28
Figure 39: View WordCount job logs.
• Click the link for the map tasks. See Figure 40.
Figure 40: View WordCount job logs.
• Click the link for the first attempt. See Figure 41.
29. CS246: Mining Massive Datasets - Problem Set 0 29
Figure 41: View WordCount job logs.
• Click the link for the full logs. See Figure 42.
Figure 42: View WordCount job logs.
2.6 Using your local machine for development
If you’d rather use your own development environment instead of working in the IDE, follow
these steps:
1. Make sure that you have an entry for localhost.localdomain in your /etc/hosts
file, e.g.
30. CS246: Mining Massive Datasets - Problem Set 0 30
127.0.0.1 localhost localhost.localdomain
2. Install a copy of Hadoop locally. The easiest way to do that is to simply download
the archive from http://archive.cloudera.com/cdh5/cdh/5/hadoop-latest.tar.
gz and unpack it.
3. In the unpacked archive, you’ll find a etc/hadoop directory. In that directory, open
the core-site.xml file and modify it as follows:
<?xml version=” 1.0 ”?>
<?xml−s t y l e s h e e t type=” text / xsl ” href=” configuration . xsl ”?>
<!−− Put site −s p e c i f i c property overrides in t h i s f i l e . −−>
<configuration>
<property>
<name>f s . default . name</name>
<value>hdfs: //192.168.56.101 :8020</ value>
</ property>
</ configuration>
4. Next, open the yarn-site.xml file in the same directory and modify it as follows:
<?xml version=” 1.0 ”?>
<?xml−s t y l e s h e e t type=” text / xsl ” href=” configuration . xsl ”?>
<!−− Put site −s p e c i f i c property overrides in t h i s f i l e . −−>
<configuration>
<property>
<name>yarn . resourcemanager . hostname</name>
<value>192.168.56.101</ value>
</ property>
</ configuration>
You can now run the Hadoop binaries located in the bin directory in the archive, and
they will connect to the cluster running in your virtual machine.
Further Hadoop tutorials
• Yahoo! Hadoop Tutorial: http://developer.yahoo.com/hadoop/tutorial/
• Cloudera Hadoop Tutorial:
http://www.cloudera.com/content/www/en-us/training/library/tutorials.html
• How to Debug MapReduce Programs:
http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms
31. CS246: Mining Massive Datasets - Problem Set 0 31
Further Eclipse tutorials
• Genera Eclipse tutorial:
http://www.vogella.com/articles/Eclipse/article.html.
• Tutorial on how to use the Eclipse debugger:
http://www.vogella.com/articles/EclipseDebugging/article.html.
3 Task: Write your own Hadoop Job
Now you will write your first MapReduce job to accomplish the following task:
• Write a Hadoop MapReduce program which outputs the number of words that start
with each letter. This means that for every letter we want to count the total number
of words that start with that letter. In your implementation ignore the letter case, i.e.,
consider all words as lower case. You can ignore all non-alphabetic characters.
• Run your program over the same input data as above.
What to hand-in: Hand-in the printout of the output file and upload the source code.