This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
1. IBM
Getting started with Big SQL's Web
tooling (Data Server Manager 4.2)
Cynthia M. Saracco
IBM Solution Architect
Sept. 15, 2016
2. Page 2 of 38
Contents
LAB 1 OVERVIEW..................................................................................................................................................4
1.1. WHAT YOU'LL LEARN ..................................................................................................................... 4
1.2. ABOUT YOUR ENVIRONMENT .......................................................................................................... 4
1.3. GETTING STARTED ........................................................................................................................ 5
LAB 2 EXPLORING THE BASICS .........................................................................................................................6
2.1. LAUNCHING BIGINSIGHTS HOME AND BIG SQL WEB TOOLING ........................................................... 6
2.2. CONNECTING TO YOUR DATABASE .................................................................................................. 9
2.3. USING HELP FACILITIES ............................................................................................................... 13
LAB 3 EXPLORING AND QUERYING YOUR DATABASE .................................................................................16
3.1. EXPLORING THE CONTENTS OF YOUR DATABASE............................................................................. 16
3.2. ISSUING QUERIES AND INSPECTING RESULTS.................................................................................. 18
3.3. OPTIONAL: USING SYNTAX ASSISTANCE......................................................................................... 20
LAB 4 MONITORING YOUR ENVIRONMENT.....................................................................................................23
4.1. SETTING UP YOUR ENVIRONMENT ................................................................................................. 23
4.2. EXAMINING DATABASE METRICS.................................................................................................... 24
4.3. EXAMINING ALERTS ..................................................................................................................... 27
LAB 5 COLLECTING STATISTICS AND VIEWING DATA ACCESS PLANS.....................................................31
5.1. COLLECTING STATISTICS.............................................................................................................. 31
5.2. USING EXPLAIN ....................................................................................................................... 35
LAB 6 SUMMARY ................................................................................................................................................37
3.
4. Page 4 of 38
Lab 1 Overview
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your
Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
1.1. What you'll learn
After completing all exercises in this lab guide, you'll know how to
• Launch the web tooling.
• Connect to your Big SQL server.
• Execute Big SQL queries and inspect results.
• Explore Big SQL database metrics and alerts.
• Collect statistics.
• Generate and inspect data access plans.
Allow 1.5 - 2 hours to complete all sections of this lab. For additional information about DSM capabilities,
please visit IBM’s DSM site at http://www-03.ibm.com/software/products/en/ibm-data-server-manager. If
you have questions or comments about this lab, please post them to the forum on Hadoop Dev at
https://developer.ibm.com/hadoop/support/.
Separate labs are available on getting started with Big SQL, using Big SQL with HBase, and using Spark
to access Big SQL data.
1.2. About your environment
This lab requires an environment in which Big SQL 4.2, BigInsights Home, DSM, and all pre-requisite
services are installed and running.
The primary test environment for this lab was a multi-node cluster running BigInsights 4.2 as a cloud
service on IBM Bluemix (www.bluemix.net). Sample configuration information is shown in the table
below. Modify the code samples and instructions as needed to match your configuration.
Property Value
Host name myhost.ibm.com
User ID (biadmin authority) yourID
Password yourPass
Ambari port number 9443
Big SQL database name bigsql
5. Page 5 of 38
Big SQL port number 51000
Big SQL installation
directory
/usr/ibmpacks/bigsql
Big SQL samples directory /usr/ibmpacks/bigsql/4.2.0.0/bigsql/samples/data
BigInsights Home
https://myhost.ibm.com:8443/gateway/default/BigInsightsWeb/index.html
#/welcome
About the screen captures, sample code, and environment configuration
Screen captures in this lab depict examples and results that may vary from what you
see when you complete the exercises. In addition, some code examples may need to
be customized to match your environment.
1.3. Getting started
To get started with the lab exercises, you need access to a working Big SQL environment. A free IBM
BigInsights Quick Start Edition is available for download at http://www-
01.ibm.com/software/data/infosphere/hadoop/trials.html. In addition, the BigInsights for Apache Hadoop
cloud service is available on Bluemix (www.bluemix.net). The Enterprise plan (a fee-based offering
sometimes referred to as “eHaaS” or enterprise Hadoop as a service) includes the required components
for this lab.
This lab was tested against a BigInsights 4.2 cloud service on a multi-node cluster. For information
about how to install and configure BigInsights on your own cluster, consult the product's Knowledge
Center
(http://www.ibm.com/support/knowledgecenter/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.we
lcome.doc/doc/welcome.html).
Before continuing with this lab, verify that Big SQL, DSM, and all its pre-requisite services are running.
If have any questions or need help getting your environment up and running, visit Hadoop Dev
(https://developer.ibm.com/hadoop/) and review the product documentation or post a
message to the forum. You cannot proceed with subsequent lab exercises without access to a
working environment.
6. Page 6 of 38
Lab 2 Exploring the basics
IBM provides Web tools that enable you to inspect Big SQL database metrics, issue Big SQL statements,
and perform other functions. These tools are part of IBM Data Server Manager (DSM).
After completing this lab, you will know how to:
Launch the BigInsights Home page and the Big SQL Web tooling (DSM).
Establish a connection to your Big SQL database.
Access online help.
Prior to beginning this lab, you will need access to a Hadoop cluster in which BigInsights Home, Big
SQL, DSM, Ambari, and all pre-requisite services are running.
Allow 30 minutes to complete this lab.
2.1. Launching BigInsights Home and Big SQL Web tooling
Big SQL Web tooling is accessed through a link in the BigInsights Home page. In this exercise, you’ll
verify that BigInsights Home and Big SQL Web tooling (DSM) are installed and running on your cluster.
Then you’ll launch the Home page and the Web tooling.
__1. Launch Ambari and sign into its console. If necessary, consult a separate lab on Getting Started
with Big SQL for details on how to do this.
7. Page 7 of 38
__2. From the Ambari dashboard, inspect the list of services in the left pane. Scroll down as needed
to verify that BigInsights Home, BigInsights – Big SQL, and BigInsights Data Server Manager are
running, as well as all pre-requisite services (e.g., HDFS, MapReduce2, Hive, and Knox). The
previous screen capture depicts a subset of the Ambari dashboard.
__3. If you are using the BigInsights cloud service on Bluemix, skip to the next step. Otherwise,
__a. Click on the Knox service.
__b. Start the LDAP demo.
__c. Click the Ambari Hosts tab and expand the information about the nodes in your cluster to
locate where Knox is running.
8. Page 8 of 38
__4. Launch BigInsights Home, providing the appropriate URL based on your installation’s
configuration. (If you installed BigInsights with Knox on your own hardware and accepted default
installation values, the BigInsights Home URL is similar to the link shown below. Substitute the
location of the Knox gateway on your cluster for the italicized text in this example.)
https://yourKnoxGatewayNode:8443/gateway/default/BigInsightsWeb/index.html#/welcome
If you're unable to launch BigInsights Home, you may have an installation or configuration
problem. Consult the product documentation or post a message to the forum on Hadoop Dev
at https://developer.ibm.com/answers?community=hadoop. You cannot proceed with
subsequent lab exercises without access to a working environment.
__5. When prompted, enter a valid user ID and password. (For the Bluemix cloud service, enter the
user ID and password you were given. For native installations, provide a valid ID for the Knox
gateway service. Defaults are guest / guest-password).
__6. Verify that BigInsights Home displays an item for Data Server Manager. (Your display may vary
from the screen image below, depending on the components installed on your cluster.)
9. Page 9 of 38
__7. Click the Launch button in the Data Server Manager box. Your screen may appear somewhat
different than the image below when you launch the tool for the first time.
2.2. Connecting to your database
In this exercise, you will establish a connection to your Big SQL database.
__1. With the Big SQL web tooling launched, click Settings > Manage Connections.
10. Page 10 of 38
__2. Inspect the database connection display. If any connections were previously created, they will
appear here. (The screen capture below was taken from a system in which 2 connections were
defined.)
Host names masked in this lab
Host names in this screen image, and in subsequent screen images in
this lab, have been masked to discourage hacking. Your display will
show the full host name information appropriate for your cluster.
__3. Click Add to add a new connection.
__4. Inspect the pop-up menu that appears.
11. Page 11 of 38
__5. Complete the required information for your connection, scrolling down if needed to expose all
menu items. Sample data is shown below. Adjust the host name, port number and any other
information as needed to match your environment. Future examples in this lab presume that
your connection name is bigsql.
If your cluster requires a secure sockets layer connection (SSL), click on the Advanced JDBC
Properties to add a property for sslConnection and set the value to true. (The BigInsights for
Apache Hadoop 4.2 enterprise cloud service requires this.)
12. Page 12 of 38
__6. Click Test Connection and verify that the operation succeeds.
__7. Click OK to clear the message.
__8. Click OK again to save the connection.
__9. Verify that your new connection appears in the Database Connections window.
13. Page 13 of 38
2.3. Using Help facilities
Online tutorials and reference information are part of DSM. This short exercise introduces you to
available Help facilities.
__1. With the Big SQL web tooling launched, click Help > Open Help.
__2. Inspect the information in the new pane that appears on your dashboard, a subset of which is
shown in the following screen image:
__3. Explore the various information available to you, including video demos, a forum, and a Help
search facility.
__4. Optionally, click the arrow key beside the Help pane to close it.
14. Page 14 of 38
__5. Inspect details about your DSM installation. Click Help > About.
__6. Review the version and build information that appears in a new browser tab.
__7. Optionally, click the Plug-ins and System Properties tabs to reveal further details. A subset of
the information available in each is shown below. Close the tabs when done.
16. Page 16 of 38
Lab 3 Exploring and querying your database
Now that you’ve launched DSM and established a database connection, you’re ready to write some
queries and explore the contents of your database. After completing this lab, you will know how to:
Explore tables in your database.
Execute queries and inspect results.
Use the SQL syntax assistant.
Prior to starting this lab, it will be helpful if you have created and populated the GO_REGION_DIM
sample table presented in the Querying Structured Data module of the Getting Started with Big SQL lab
(http://www.slideshare.net/CynthiaSaracco/big-sql40-hol).
Allow 30 minutes to complete this lab.
3.1. Exploring the contents of your database
In this exercise, you will work with DSM administrative tooling to explore the tables and views in your
database.
__1. If necessary, connect to your Big SQL database. Locate the IBM Data Server Manager title bar
at top. Just to the right is a box with a drop-down arrow that lists stored connections. Click on
the arrow to expose its contents.
__2. Select the connection that you created in the previous lab.
__3. Click Administer > Tables in the menu at left.
17. Page 17 of 38
__4. Inspect the list of tables displayed. These are tables stored on the Big SQL Head node,
including system-supplied tables and any EXPLAIN tables that might have been created as part
of a separate lab on Getting Started with Big SQL lab. Such tables are “local” or single-node
tables; they are not distributed across your Hadoop cluster.
__5. Display tables created in HDFS (i.e., tables distributed across your cluster). Click Administer >
Hadoop Tables. The following screen capture was taken from a system in which multiple users
created and populated sample tables discussed in the separate Getting Started with Big SQL
lab.
__6. Click on a table to expose details about it, using the scroll bar as needed. The example below
shows information about the SARACCO.GO_REGION_DIM table. Note that this is a Hive-
managed table stored in the Hive warehouse and that the default SerDe (LazySimpleSerDe) is
used to process its contents.
18. Page 18 of 38
__7. Close the tab for the table.
3.2. Issuing queries and inspecting results
In this exercise, you will execute a query against the system catalog and inspect the results.
__1. Click Run SQL in the menu at left.
__2. Verify that you have an active database connection. If necessary, consult the first few steps of
the prior lab for instructions on using a database connection.
__3. Query the system catalog for information about tables in your database. Paste the following
query into the SQL editor pane and click Run All.
select tabschema, tabname from syscat.tables fetch first 15 rows only;
19. Page 19 of 38
__4. Inspect the Results pane at bottom and verify that the query completed successfully.
__5. To view the query results, click the Data link in the Query Results column.
__6. Inspect the results, using vertical scroll bar at right as needed.
__7. Close the Data tab and return to the SQL editor.
__8. Optionally, introduce an error into your query so you can explore a simple debugging scenario.
__a. Edit your original query to refer to a non-existent catalog table: syscat.table. The correct
catalog table name is syscat.tables, so simply remove the trailing “s”.
20. Page 20 of 38
select tabschema, tabname from syscat.table fetch first 15 rows only;
__b. Highlight the edited statement with your cursor.
__c. Click the drop-down arrow to the right of the Run All button. Click Run Selected to
execute only your highlighted query. (This approach to query execution is convenient if
you have multiple queries in your SQL editor pane and only want to execute a subset of
them.)
__d. Inspect the results, noting that the query failed.
__e. Click the Log link in the Query Results column for this query. Inspect the details
displayed to learn more about the query failure. Click OK when done.
3.3. Optional: Using syntax assistance
DSM includes a syntax assistant to help you visually construct simple Big SQL queries. This exercise
introduces you to this feature of the SQL editor. Prior to completing this exercise, you must have created
and populated the GO_REGION_DIM sample table outlined in the Querying Structured Data module of
the Getting Started with Big SQL lab (http://www.slideshare.net/CynthiaSaracco/big-sql40-hol)
__1. With the SQL editor launched, click Syntax Assist.
21. Page 21 of 38
__2. When a list of tables appears, click on the GO_REGION_DIM table.
__3. Use Control-Click to select the following columns: COUNTRY_KEY, COUNTRY_CODE, and
ISO_THREE_LETTER_CODE. Click OK.
__4. Inspect the query that appears in the SQL editor. You can manually modify the query if desired
(perhaps to add a WHERE clause). However, in this case, simply execute the query.
22. Page 22 of 38
__5. Inspect the results. As you did in an earlier exercise, click on the Data link the Query Results
column for this query.
23. Page 23 of 38
Lab 4 Monitoring your environment
You can monitor your database and view event alerts through DSM. This lab introduces you to some
key capabilities available through the monitoring facilities. However, a full discussion of DSM's
monitoring facilities is beyond the scope of this introductory lab.
After completing this lab, you will know how to:
Examine metrics associated with your Big SQL database.
Explore system-generated alerts about database events.
Allow 15 - 30 minutes to complete this lab.
Prior to starting this lab, create and populate the sample tables outlined in the Querying Structured Data
module of the Getting Started with Big SQL lab (http://www.slideshare.net/CynthiaSaracco/big-sql40-hol).
4.1. Setting up your environment
Before inspecting metrics about your database activities, you will run some queries to generate a very
small workload.
__1. If you haven't already done so, create and populate the sample tables outlined in the Querying
Structured Data module of the Getting Started with Big SQL lab
(http://www.slideshare.net/CynthiaSaracco/big-sql40-hol). You can use the DSM SQL editor to
run the required Big SQL commands, if desired.
__2. Copy the following query into the DSM Big SQL Editor, changing the schema name of each table
as needed. (For example, if the tables were created in the “user1” schema, change the name of
each table from “saracco.sls_sales_fact” to “user1.sls_sales_fact” and so on.) This query will
retrieve data about goods sold by a fictional retailer:
SELECT pnumb.product_name, sales.quantity,
meth.order_method_en
FROM
saracco.sls_sales_fact sales,
saracco.sls_product_dim prod,
saracco.sls_product_lookup pnumb,
saracco.sls_order_method_dim meth
WHERE
pnumb.product_language='EN'
AND sales.product_key=prod.product_key
AND prod.product_number=pnumb.product_number
AND meth.order_method_key=sales.order_method_key;
__3. Next, run the following query. Again, modify the table names as needed to match your schema.
SELECT pll.product_line_en AS Product,
md.order_method_en AS Order_method,
sum(sf.QUANTITY) AS total
24. Page 24 of 38
FROM
saracco.sls_order_method_dim AS md,
saracco.sls_product_dim AS pd,
saracco.sls_product_line_lookup AS pll,
saracco.sls_product_brand_lookup AS pbl,
saracco.sls_sales_fact AS sf
WHERE
pd.product_key = sf.product_key
AND md.order_method_key = sf.order_method_key
AND pll.product_line_code = pd.product_line_code
AND pbl.product_brand_code = pd.product_brand_code
GROUP BY pll.product_line_en, md.order_method_en;
4.2. Examining database metrics
DSM provides metrics about your Big SQL database and its overall health. This exercise shows you how
to launch the database monitoring tools and begin exploring some metrics. For further details, consult
the product Knowledge Center or online help information.
__1. Launch the database monitoring facility. Click Monitor > Database.
__2. Inspect the Overview information presented for the various members (nodes) in which Big SQL
has been installed on your cluster. If necessary, click the Members tab beneath the Overview
tab.
The image below was taken from a cluster with one Big SQL head node and 3 Big SQL worker
nodes. Included are summaries related to CPU usage, I/O, and sorting. Specific data you see
on your screen will vary from what’s shown below.
__3. Expose further information about your Big SQL environment. Click on the Data Server tab
beneath the Overview tab.
25. Page 25 of 38
Note that an overview of database-related activities appears in the pane at left under Database
Time Breakdown. Also note that a collection of key metrics appears at right. Specific data you
see on your screen will vary from what’s shown below.
__4. Explore the impact of recently executed SQL statements. Click the Statements tab (next to the
Overview tab). By default, information about any in-flight SQL statements are displayed.
__5. Click the Package Cache tab to display information about recently executed statements.
__6. Inspect the results, using the horizontal and vertical scroll bars as needed to display full
information. Note the various data available for each statement regarding average CPU time,
rows read, rows returned, etc.
26. Page 26 of 38
__7. Click on a given statement. A pop-up window appears showing the full query.
__8. Click Close to close the query display window.
__9. With the query selected, click the View Details button at top.
27. Page 27 of 38
__10. Inspect further details about your statement, clicking on the plus sign (+) beside each menu item
to expose information. A portion of such data is shown below
4.3. Examining alerts
DSM includes an alert facility that reports information about potential problems related to database
operations. This exercise shows you how to launch this facility and begin exploring some of its
capabilities. For further details, consult the product Knowledge Center or online help information.
__1. Launch the database monitoring facility. Click Monitor > Alerts.
28. Page 28 of 38
__2. Inspect the output that appears in the primary pane. By default, open events will appear here.
This screen capture was taken from a system in which no significant events were open.
__3. Optionally, display information related to a broader range of events. From the drop-down menu
at left, change the scope of alerts to All Open and Closed. This will cause a new Duration item
to appear. Choose a longer duration, such as the Last 30 Days.
Inspect the results. This screen capture was taken from a system in which an event
compromising database availability had been detected and closed.
If any system-issued alerts appear on your screen, click on the checkbox next to an alert and
then click the View details button.
29. Page 29 of 38
Further information related to the alerting event will appear, beginning with a summary section
exposed through the What's Wrong? page.
If desired, explore the details and recommended actions provided for your alert.
__4. In the upper right corner, click on the Alerts button to expose a list of recent alerts opened in the
last 5 minutes.
31. Page 31 of 38
Lab 5 Collecting statistics and viewing data access plans
This lab shows you how to use DSM to collect statistics about your tables to help the Big SQL query
optimizer make well-informed decisions about data access plans for your queries. In addition, this lab
shows you how to view the data access plan selected for a given query using EXPLAIN. The tasks in
this lab are similar to those covered in a separate module on data access plans included in the Getting
Started with Big SQL lab (http://www.slideshare.net/CynthiaSaracco/big-sql40-hol). However, in this lab,
you will use Web tooling rather than a command-line interface to collect statistics and examine data
access plans.
After completing this lab, you will know how to:
Collect meta data (statistics) about your data.
Collect and review data access plans for your queries.
Prior to beginning this lab, you need to create and populate the SLS_PRODUCT_DIM table as outlined
in a separate lab on Querying Structured Data in the Getting Started with Big SQL lab
(http://www.slideshare.net/CynthiaSaracco/big-sql40-hol).
Allow 30 minutes to complete this lab.
5.1. Collecting statistics
DSM enables you to "analyze" -- or collect statistics about -- tables in your Big SQL database. These
statistics influence query optimization, enabling the Big SQL query engine to select an efficient data
access path to satisfy your query. Collecting and maintaining accurate statistics is important when
dealing with large volumes of data.
__1. Display information about your Hadoop tables as you did in a previous lab exercise. (Click
Administer > Hadoop Tables).
__2. Locate the SLS_PRODUCT_DIM table and click on the box beside it.
32. Page 32 of 38
__3. Click the Analyze button at top.
__4. Inspect the new menu that appears, including the command pane at bottom.
33. Page 33 of 38
__5. Customize your work as shown below. In the Columns box, select Full. Then select the
columns for which you want to collect statistics. To do so, drag the following columns from the
Available Columns pane to the pane beside it: product_key, product_number,
product_line_code, product_brand_code. Note how the query in the Command pane at
bottom has changed.
34. Page 34 of 38
__6. Click Run.
__7. Inspect the results at bottom and verify that the commands completed successfully.
35. Page 35 of 38
5.2. Using EXPLAIN
The EXPLAIN feature enables you to inspect the data access plan selected by the Big SQL optimizer for
your query. Such information is highly useful for performance tuning. This exercise introduces you to
the EXPLAIN feature in DSM.
If you already created EXPLAIN tables as part of the separate lab exercise on data access plans
included in the Getting Started with Big SQL lab (http://www.slideshare.net/CynthiaSaracco/big-sql40-
hol) or if you are using the enterprise BigInsights for Apache Hadoop service on Bluemix, skip the first 2
steps below.
__1. From the SQL editor, create the necessary EXPLAIN tables to hold information about your query
plans by calling the SYSINSTALLOBJECTS procedure. The example below shows one way in
which this procedure can be invoked and presumes you have “bigsql” administrative authority.
Casting a NULL in the last parameter causes a single set of EXPLAIN tables to be created in
schema SYSTOOLS, which can be made accessible to all users.
CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS
VARCHAR(128)));
__2. Authorize all Big SQL users to read data from the SYSTOOLS.EXPLAIN_INSTANCE table
created by the stored procedure you just executed.
grant select on systools.explain_instance to public;
__3. Paste or type the following query into the SQL editor but do not run it. Alter the table’s schema
name as needed to match your environment.
select distinct product_key, introduction_date
from saracco.sls_product_dim;
__4. Click Explain.
If you’re using the Bluemix service and receive a message that the environment needs to be
configured to support Explain operations, follow the instructions to complete the configuration
and then proceed. Note that you may need to do so while connected as “biadmin”. If you need
additional help, contact Bluemix support.
36. Page 36 of 38
__5. Examine the output. If desired, hover over each box in the data access plan to review further
details. Icons in the Query tab enable you to zoom in and out, scroll through the plan, and
perform other functions.
The plan for this query will be different from the sample shown here if you created a primary key
constraint on the PRODUCT_KEY column, as discussed in a separate lab. (With such a
constraint, Big SQL will automatically determine that it does not need to sort the data to eliminate
duplicates. Click on the SQL Statement menu option at top to compare the original query to the
optimized version.)
37. Page 37 of 38
Lab 6 Summary
Congratulations! You’ve just learned several features of Data Server Manager, IBM’s Web tooling for Big
SQL. To expand your skills and learn more, consult the product's online documentation. In addition, visit
the HadoopDev web site (https://developer.ibm.com/hadoop/) for links to tutorials, blogs, and other
technical resources related to BigInsights and Hadoop.