• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
EMC Greenplum Management enable by Aginity Workbench
 

EMC Greenplum Management enable by Aginity Workbench

on

  • 940 views

EMC Greenplum Management enable by Aginity Workbench

EMC Greenplum Management enable by Aginity Workbench

Statistics

Views

Total Views
940
Views on SlideShare
939
Embed Views
1

Actions

Likes
0
Downloads
31
Comments
0

1 Embed 1

http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    EMC Greenplum Management enable by Aginity Workbench EMC Greenplum Management enable by Aginity Workbench Document Transcript

    • White PaperEMC GREENPLUM MANAGEMENT ENABLED BYAGINITY WORKBENCHA Detailed Review EMC SOLUTIONS GROUP Abstract This white paper discusses the features, benefits, and use of Aginity Workbench for EMC® Greenplum® – a comprehensive management and development tool, specially tailored for the features and architecture of the EMC Greenplum Database. August 2011
    • Copyright © 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All trademarks used herein are the property of their respective owners. Part Number: H8762EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 2
    • Table of contents Executive summary ............................................................................................................... 5 Business case .................................................................................................................................. 5 Solution overview ............................................................................................................................ 5 Key benefits ..................................................................................................................................... 5 Introduction .......................................................................................................................... 7 Purpose ........................................................................................................................................... 7 Scope .............................................................................................................................................. 7 Audience.......................................................................................................................................... 7 Terminology ..................................................................................................................................... 7 Technology overview ............................................................................................................. 8 Overview .......................................................................................................................................... 8 Aginity Workbench ........................................................................................................................... 8 EMC Greenplum Database................................................................................................................ 8 Configuration ........................................................................................................................ 9 Overview .......................................................................................................................................... 9 Environment diagram ....................................................................................................................... 9 Greenplum environment description .............................................................................................. 10 EMC Greenplum Master Server .................................................................................................. 10 EMC Greenplum Segment Servers.............................................................................................. 10 Operational scenarios ......................................................................................................... 11 Overview ........................................................................................................................................ 11 List of scenarios ............................................................................................................................. 11 Scenario 1: Browse objects in the Greenplum Database ................................................................. 11 Scenario 2: Examine data distribution in the Greenplum Database ................................................ 13 Scenario 3: Identify poorly performing queries and optimize performance ..................................... 16 Scenario 4: Examine the status of Greenplum segments ................................................................ 19 Scenario 5: Optimize space usage in a Greenplum Database ......................................................... 21 Scenario 6: Examine roles and resource queues ............................................................................ 23 Scenario 7: Import or export data into or out of a database ............................................................ 24 Conclusion ......................................................................................................................... 27 Summary ....................................................................................................................................... 27 Findings ......................................................................................................................................... 27 EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 3
    • References .......................................................................................................................... 28 White papers ................................................................................................................................. 28 Product documentation.................................................................................................................. 28 Other information .......................................................................................................................... 28 EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 4
    • Executive summaryBusiness case The EMC® Greenplum® Database is a high-performance data warehouse system that employs a massively parallel processing (MPP) architecture – many servers working in parallel on database tasks. While the details of the architecture and operation are largely hidden from database users, database administrators (DBAs) and developers often need access to these details to check system health, ensure optimal performance, and develop business analytics quickly and easily to derive value from the data in the warehouse. Standard query and DBA tools fall short of providing visibility into the features of parallel-processing architecture in general, and the unique features of the Greenplum Database in particular.Solution overview Aginity Workbench for EMC Greenplum (Aginity Workbench) offers a simple and efficient method of managing a Greenplum Database. Aginity Workbench gives you a single point of access to manage, monitor, and develop a Greenplum Database, by offering a range of tools and functions that look deep into the Greenplum architecture. With Aginity Workbench, you can: • Examine the operational status of all segments • Browse all objects in the Greenplum Database and make modifications • Run multiple queries and export results to common file formats including Microsoft Excel • Generate SQL and DDL with drag-and-drop ease • Analyze query plans • Quickly find tables that should be vacuumed to free up database resources • See how primary and mirror Segment Instances are distributed across the Segment Servers • Graphically view table distribution and easily spot distribution skew • Easily redistribute dataKey benefits Aginity Workbench brings a new level of insight into the Greenplum Database that no other graphical user interface (GUI) tool can provide. Benefits of using Aginity Workbench include: • Ease of use - With a single access point from a user-friendly GUI, you require less time and effort to accomplish daily tasks with the Greenplum Database. • Access to individual components allows for detailed diagnostics - You can analyze, test, and reset the database servers more quickly, which reduces down time. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 5
    • • Optimization of database performance - You can adjust the database settings to maximize its performance.• Reduction of user errors - Developers can use the built-in functions instead of user-written scripts, which reduces errors and time spent on scripting. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 6
    • IntroductionPurpose The purpose of this white paper is to examine the functionality of the Aginity Workbench and demonstrate the benefits of using it to access, manipulate, and monitor a Greenplum Database.Scope This white paper describes the features and benefits of using Aginity Workbench in a Greenplum Database environment and describes the functionality of the main features of the product. This white paper does not provide configuration information for installing Aginity Workbench into a Greenplum environment.Audience This white paper is intended for EMC employees, partners, customers, and anyone interested in using Aginity Workbench to manage a Greenplum Database.Terminology This white paper includes the following terminology. Table 1. Terminology Term Definition Analytics Analytics is the study of operational data using statistical analysis with a goal of identifying and using patterns to optimize business performance. Business intelligence Business intelligence is the effective use of information assets to improve the profitability, productivity, or efficiency of a business. Frequently, IT professionals use this term to refer to the business applications and tools that enable such information usage. DDL Data Definition Language is the syntax that is used to define and create objects in a relational database. Master Server In an EMC Greenplum Database, the Master Server or Host controls the operation of the entire system and is the main connection point for external clients accessing the database. The Master Server distributes incoming queries to the Segment Servers, gathers the results, and returns them to the client. Massively parallel MPP is the coordinated processing of data by multiple machines that work together processing (MPP) on a task. In a shared-nothing MPP architecture, such as EMC Greenplum, each machine has its own memory and storage and is not choked by negotiation of shared resources. Segment Server In an EMC Greenplum Database, a Segment Server is one of the worker nodes/servers that is used to do the work in the MPP deployment. Shared-nothing Shared-nothing is a distributed computing architecture made up of a collection of architecture independent, self-sufficient servers. This is in contrast to a traditional central computer that hosts all information and processing in a single location. SQL Structured Query Language is the syntax that is used to access data from a relational database. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 7
    • Technology overviewOverview The primary components used in this environment are: • Aginity Workbench • EMC Greenplum DatabaseAginity Workbench Aginity Workbench makes developers and DBAs more productive by using tools that give new access and insight into the Greenplum Database and Greenplum Data Computing Appliance. Created by and for Aginity’s own developers, Aginity Workbench is a client-based application that communicates with the Greenplum Database and has a deep understanding of the Greenplum internal architecture. For developers, Aginity Workbench has an intuitive interface for creating, managing, and tracking both individual SQL queries and entire databases. Sophisticated tools help developers analyze and tune queries for maximum performance. Results can be easily viewed or exported to other formats, such as Microsoft Excel, for further use. For DBAs, Aginity Workbench provides graphical information on important properties such as node status, database size and bloat, and table distribution and skew. Built- in functions assist with generating the commands used to maintain and optimize the database operation and health.EMC Greenplum EMC Greenplum Database is a shared-nothing, MPP architecture that has beenDatabase designed for business intelligence and analytical processing. In this architecture, each server node acts as a self-contained database management system that owns and manages a distinct portion of the overall data. The system automatically distributes data and parallelizes query workloads across all available hardware. The core shared-nothing MPP architecture enables massive data storage, loading, and processing with linear scalability. Adaptive services provide worldwide businesses with high availability, workload management, and online expansion of capacity. Key product features enable petabyte-scale loading, hybrid storage (row or column) to best fit the unique needs of each analytical use case, and embedded support for SQL, MapReduce, and programmable analytics. In addition, all major third-party analytic and administration tools are supported through standard client interfaces. The core principle of the EMC Greenplum Database is to move the processing dramatically closer to the data and its users. This effectively enables the computational resources to process every query in a fully parallel manner, use all storage connections simultaneously, and flow data efficiently between resources as the query plan dictates. The result is that complex processing can be pushed down in close proximity to the data for maximum efficiency and incredible performance. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 8
    • ConfigurationOverview Aginity Workbench is a Microsoft Windows-based tool and can attach to any Greenplum Database. Aginity Workbench uses a native EMC Greenplum connection from the Microsoft Windows client to the Greenplum Database. Aginity Workbench is a .NET application and is currently supported on the following platforms: • Windows XP (32-bit) • Windows 7 (32-bit and 64-bit) • Windows Server 2003 (32-bit and 64-bit) • Windows Server 2008 (32-bit and 64-bit)Environment In this white paper, several operational scenarios are described to show how thediagram Aginity Workbench integrates with the Greenplum Database and makes it easier for you to manage the system. Figure 1 shows a generic Greenplum environment being managed by Aginity Workbench. Figure 1. Aginity Workbench in a generic Greenplum environment EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 9
    • Greenplum Aginity Workbench runs on a Windows client that has a connection to the Greenplumenvironment Master Server through the data center network. You can use Aginity Workbench todescription develop and analyze queries, as well as maintain and optimize the database. EMC Greenplum Master Server The Greenplum Master Server is the access point for all user requests to the Greenplum Database and it also handles all coordination of the Segment Servers. EMC Greenplum Segment Servers The Greenplum Segment Servers are the workers of the Greenplum Database and perform all MPP tasks. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 10
    • Operational scenariosOverview This section details some common operational scenarios of the Aginity Workbench that you can use to manage the Greenplum Database.List of scenarios Aginity Workbench was exercised in the following scenarios: • Scenario 1: Browse objects in the Greenplum Database • Scenario 2: Examine data distribution in the Greenplum Database • Scenario 3: Identify poorly performing queries and optimize performance • Scenario 4: Examine the status of Greenplum segments • Scenario 5: Optimize space usage in a Greenplum Database • Scenario 6: Examine roles and resource queues • Scenario 7: Import or export data into or out of a databaseScenario 1: Browse The purpose of this scenario is to expand schemas to view tables, columns, views,objects in the stored procedures, and other database objects.Greenplum A key function of any database tool is to simply allow browsing and examination ofDatabase database objects. Aginity Workbench has a familiar tree structure to “walk” into the hierarchy of the database. Figure 2 shows the top-level view of a Greenplum Database showing the databases - and their sizes - in the system. Figure 2. Aginity Workbench tree structure EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 11
    • Figure 3 shows a database expanded to display database objects. The view displaysGreenplum-specific objects and information such as Partitions and the Distributed Byclause in a table definition. This information is typically missed by tools that do notunderstand the Greenplum architecture.Figure 3. Expanded database showing database objects EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 12
    • Each of the objects has a robust context menu that provides many useful functions that DBAs and developers can use to work more efficiently. Figure 4 shows the ability to quickly construct a Select statement for a particular table. Figure 4. Select statement script The resulting Select statement can be edited as desired and then executed. Additional menu selections will build Insert, Update, and Delete statements as well as the DDL commands to create the table. These commands can be sent to the workbench query window as well as to the clipboard for pasting into other programs. These shortcut functions are handy for both initial design as well as reverse engineering of existing designs. Note Commands are only shown in the menu if they are relevant to the object.Scenario 2: The purpose of this scenario is to:Examine data • Check the data distribution of tables to determine how well the data isdistribution in the balanced across all the Segment ServersGreenplumDatabase • Identify a poorly distributed table and redistribute the data for better query performance EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 13
    • Figure 5 shows a poor table distribution.Figure 5. Query results showing poor table distribution EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 14
    • To change the table distribution, you need to choose the Change distribution option,under Advanced, as shown in Figure 6.Figure 6. Select Change distribution menu optionAs shown in Figure 7, you can choose one or more of the Available Columns by whichto redistribute the table. In this example, proc_id was selected.While Aginity Workbench makes it easy to change the distribution key, it is up to youto choose the column (or columns) that will actually result in a better distribution ofthe data. Selecting multiple columns for a distribution key makes a composite keyfrom those columns.Figure 7. Select redistribution criteria and execute command EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 15
    • After clicking OK, Aginity Workbench provides you with the commands that perform the redistribution. As redistribution is a significant activity on all the data in a table, you must manually verify and start the execution of the command. Choosing Show Distribution again now shows the results of this redistribution activity. Figure 8 shows the successful completion of the table redistribution. Figure 8. Successful completion of redistribution showing good table distributionScenario 3: Identify The purpose of this scenario is to:poorly performing • Identify poorly performing queriesqueries andoptimize • Examine the Explain Plan for the query and determine the reason for the poorperformance performance • Optimize the query and verify that it performs better EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 16
    • To identify poorly performing queries, you go to the Object menu, and underDatabase choose Show Query History. Figure 9 shows the Query History window. Itprovides several filters to narrow down the list. The Duration column visualizes queryduration, for ease of interpretation.Figure 9. Query HistoryAfter a query is selected, the context menu enables you to choose Explain SQLStatement, which shows the full query and the query plan. It also provides the outputof an Explain Analysis of the query.Figure 10 shows the Explain Plan for the selected query. However, for larger and morecomplex Explain Plans, it may be difficult to read through all the output.Figure 10. Explain Plan for the selected query EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 17
    • As shown in Figure 11, Aginity Workbench supports you by providing iterator outputof the query. This option is available in the Context menu of the query.Figure 11. Explain PlanThe iterators give much more detailed information for the steps of the Explain Plan.Iterators are available for queries that have been executed and captured in theGreenplum Performance Monitor Database.Figure 12 shows the Query Plan window with the query plan as a navigation tree inthe left pane, and summary and detail information in the right panes. You canimmediately see the steps that are color-highlighted, which indicates that these arepossible causes of slow performance.Figure 12. Query Plan showing iterator details EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 18
    • It is evident that without such easy to navigate, interactive support, it would be much more difficult to narrow down pain points in problematic queries this quickly and efficiently.Scenario 4: The purpose of this scenario is to:Examine the statusof Greenplum • Determine the operational status of Greenplum segmentssegments • Determine the location of primary segments and their corresponding mirror segments • Identify primary segments that have failed over to their mirror segments • Observe the failback of mirror segments to the primary server when the Segment Server is restored to operation Managing a Greenplum Database means managing multiple database instances on multiple servers. Aginity Workbench supports you by providing Server Explorer. This gives a detailed view of the inner workings of the Greenplum architecture, which allows DBAs to easily visualize the system status. Server Explorer can be accessed from the Server Node in the navigation tree, as shown in Figure 13. Figure 13. Server explorer EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 19
    • Figure 14 shows the server in a healthy running state.Figure 14. Server explorer showing a healthy statusThe left pane shows the Segment Servers in the cluster. The right pane shows theconfiguration of each Segment Instance on each Segment Server. Columns can easilybe sorted by clicking on the title of a column.Color-highlighting is used to visualize the placement of the primary-mirror pairs. Foreach primary-mirror pair, there is one row that shows all the configuration details, forexample, role, mode, status, host, and so on. The colors show how the primarySegment Instances of a server are spread over different Segment Servers.This overview immediately informs you that there are no failed segments and thateach Segment Server has six primary and six mirror Segment Instances. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 20
    • If any Segment Instances are in a mode or status other than Synchronized or Up, this is highlighted as shown in Figure 15 and Figure 16. Figure 15. Server Explorer showing a failover Figure 16. Server Explorer showing resynchronization In situations where you want to focus on a certain Segment Server, clicking the node name in the left pane filters the list with segments only to that particular server.Scenario 5: The purpose of this scenario is to:Optimize space • Determine space utilization of tables in the databaseusage in aGreenplum • Find tables that have bloat caused by deletes that have not been vacuumedDatabase • Reduce system resource usage by easily executing vacuum statements on the database Periodic vacuuming of database tables helps ensure that the space occupied by deleted items is reclaimed and available for use for new data in the database. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 21
    • The Aginity Workbench makes it very easy to find the space used by the tables in thedatabase. When you right-click on the database, it lets you choose the DatabaseMaintenance option as shown in Figure 17.Figure 17. Database MaintenanceThis brings up a display of all the tables in that database and includes columns thatshow the Expected Bytes used, Actual Bytes used, Expired Bytes, and the PercentUnused.As shown in Figure 18, the Diagnostics Message column gives an indication of theamount of bloat in the table. Tables with high bloat (deleted objects whose space canbe reclaimed) can be easily vacuumed right from the menu.Figure 18. Diagnostics Message showing bloat EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 22
    • Scenario 6: The purpose of this scenario is to:Examine roles and • Examine the properties of resource queuesresource queues • Identify the resource queues to which roles are assigned An important aspect of Greenplum performance management is the notion of roles and resource queues. Roles roughly correspond to database users, and each user or role is assigned to a particular resource queue. Resource queues have associated properties that determine how much of the Greenplum system resources are applied to queries that run in those queues. Aginity Workbench can display the properties of resource queues as shown in Figure 19. Figure 19. Resource queues and user roles EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 23
    • Aginity Workbench understands the difference between resource queues with active statement limits and resource queues that have maximum query cost limits. It also understands the different priorities that resource queues can have. Aginity Workbench also displays properties of user roles, and can show the resource queue to which each role or user is assigned, as shown in Figure 19. This easy access to workload management information helps DBAs properly allocate system resources so that database jobs are executed with the greatest efficiency.Scenario 7: Import The purpose of this scenario is to:or export data into • Import data from a disk file to the databaseor out of adatabase • Export data from the database to a disk file Moving data into a database from a flat file (TXT or CSV), and exporting data from a table into a flat file, are common actions for developers as well as DBAs. Greenplum provides the SQL COPY command, which can load an entire file into the database, and is considerably more efficient than executing INSERT statements and much easier than writing a script to load data. Unfortunately, the syntax for the SQL COPY command is a little tricky and, unless you use it every day, easy to forget or enter incorrectly. Aginity Workbench provides an easy way of importing data into the database from flat files and also exporting data from a table back to a disk file. To import data from a CSV file, you right-click the table into which you want to load the data and choose Import Data. In Import Data, as shown in Figure 20, you can specify the location of the file and the format. You can also specify the encoding, delimiters, escape characters, whether the input file has a header row, as well as the Segment reject limit. The reject limit sets the number of errors in the input file that you are willing to accept before aborting the load. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 24
    • Figure 20. Import DataAs shown in Figure 21, the SQL tab shows the corresponding SQL COPY commandthat is generated, which can be edited further.Figure 21. SQL tab in Import Data windowGetting data out of the database and into flat files is just as easy; you right-click thetable and choose Export Data. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 25
    • In Export Data, as shown in Figure 22, on the Parameters tab, you can specify many ofthe same kinds of properties as for importing data. The Selection tab allows you tospecify the columns you want to export as well as an order-by clause for your desiredsorting order.Figure 22. Export DataWhile the import and export functions do not use the Greenplum gpload/gpfdistprograms for parallel bulk loading of extremely large amounts of data, thesefunctions are very handy for quickly getting smaller amounts of data into and out ofthe database. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 26
    • ConclusionSummary Aginity Workbench integrates easily with EMC Greenplum Database and allows you to quickly and efficiently manage, monitor, and access large-scale enterprise data warehouses.Findings Aginity Workbench features and functionality provides many benefits including: • Ease of use, reduction of overhead, and improved return on investment • Access to individual components in the database, which allows for detailed diagnostics and fine tuning • Optimization of database performance • Reduction of errors and down time Aginity Workbench is unmatched in its ability to expose the internals of the Greenplum Database and optimize the database with ease. EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 27
    • ReferencesWhite papers For additional information, see the white papers listed below. • EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence — An Architectural Overview • EMC Greenplum Database 4.0 — Critical Mass InnovationProduct For additional information, see the product document listed below.documentation • Greenplum Database 4.1 Administrator GuideOther information For additional information and to download the software, see the websites listed below. • Aginity.com • Greenplum.com EMC Greenplum Management Enabled by Aginity Workbench—A Detailed Review 28