JJUG CCC 2019 Fall の発表資料になります。
OpenAPI Generator を使って小規模な Web API サーバーを開発したときの経験やノウハウをまとめたものです。
https://ccc2019fall.java-users.jp/
https://jjug-cfp.cfapps.io/submissions/92e3117f-d911-4674-b97b-581813cfa0dc
JJUG CCC 2019 Fall の発表資料になります。
OpenAPI Generator を使って小規模な Web API サーバーを開発したときの経験やノウハウをまとめたものです。
https://ccc2019fall.java-users.jp/
https://jjug-cfp.cfapps.io/submissions/92e3117f-d911-4674-b97b-581813cfa0dc
DB TechShowcase Tokyo - Intelligent Data PlatformDaiyu Hatakeyama
AI (Artificial Intelligence) が様々なアプリケーション/サービスに組み込まれ始めて、それをうみだす原動力ともいえるデータプラットフォームもその立ち位置を変えてきています。次期SQL Server 2017には、Machine Learning Servicesが同梱され、まさに次世代のデータプラットフォームの一つの形といえるでしょう。このセッションでは、System of Record から、System of Insight へとその価値を変えていく最新のData Platformの世界をご紹介します。
NTTコミュニケーションズは、Hadoopを利用してマーケッティング向けログ解析システムを開発しました。本解析システムはアクセスログ、クエリログ、クリックログ、CGMデータを解析して特定の商品・サービスに対するインターネットユーザの興味やフィードバックを抽出でき、(1)評判分析、(2)関連語分析、(3)ユーザ興味推定、の3種の解析を行うことができます。本発表では、上記ログ解析システムの機能の他に、Map処理の強化によるシャッフルサイズの削減方法、我々のHadoopクラスタの特徴についても紹介します。
NTT communication developed the Hadoop-based log analysis system for the marketing purpose. This system extract the interest or feedbacks of the specific goods/products, by analyzing the access logs, query logs, click logs and CGM data. The three types of the analysis are supported: 1) reputation analysis, 2) related-word analysis 3) user interest estimation. This session also describes how to reduce the shuffle size, and the specifications of our Hadoop clusters.
8. #cmdevio #cmdevioK
デ ー タ 管 理 +
デ ー タ カ タ ロ グ
デ ー タ 加 工 +
分 析 + モ デ ル 作 成
共 有 +
拡 張 + 自 動 化
デ プ ロ イ +
リ ア ル タ イ ム
ス コ ア リ ン グ
D A T A S C I E N C E & A N A L Y T I C S C U L T U R E
C O M M U N I T Y
C O N N E C T D E S I G N E R S E R V E R P R O M O T E
業務ユーザー、IT部門、データサイエンティストと組織内を横断して
データ分析を浸透させることのできる製品展開をしています。
8最大限にデータ活用を実現するプラットフォーム
9. #cmdevio #cmdevioK
9分析までのステップを一つのプラットフォームで実現
データブレンディング 予測 & 空間分析 アウトプットの作成
データアクセス、
準備、クレンジング
データインテグレーションツール 分析のためのコーディング アプリケーション作成データクエリツール
IT 部門 データサイエンティスト Line-of-Business
Alteryx: Intuitive and Comprehensive Self-Service Experience
Connect to Data Select Data Summarize Data Analyze Data Output Data
Connect to Data Sort and Filter Data Report on Data Browse Data
T
F
R
L L
J
R
O
RBlend Data
11. #cmdevio #cmdevioK
P R E P +
A N A LY Z E + M O D E L
D E S I G N E R
参考URL:https://dev.classmethod.jp/business/business-analytics/alteryx-product-overview-alteryx-designer/
データプレップ、分析、モデリングの実現
11Alteryx Designer
12. #cmdevio #cmdevioK
S H A R E +
S C A L E + G O V E A N
S E R V E R
参考URL:https://dev.classmethod.jp/business/business-analytics/alteryx-product-overview-alteryx-server/
自動実行、ロジック共有、大規模処理スケール、ガバナンス
12Alteryx Server
13. #cmdevio #cmdevioK
D I S C O V E R +
C O L L A B O R AT E
C O N N E C T
参考URL:https://dev.classmethod.jp/business/business-analytics/alteryx-product-overview-alteryx-connect/
組織内のデータを発見しやすくする、他部署間連携
13Alteryx Connect (Alteryx Server Add-on)
14. #cmdevio #cmdevioK
D E P L O Y +
M A N A G E
P R O M O T E
参考URL:https://dev.classmethod.jp/business/business-analytics/alteryx-product-overview-alteryx-promote/
分析モデルのデプロイ・管理
14Alteryx Promote (Alteryx Server Add-on)
Review how data is processed in a workflow:
Data is read in on execution, nothing happens to the original data source
Setting Type in the select tool means that:
Data is not evaluated
Data that doesn’t fit the new type is replaced with [Null]
What is field size?
String: Width in number of characters. Setting the Size for Strings
Max Number of characters the field can be from left to right
Data that is too long is truncated
For most Numeric types the Size not configurable, see help for details
Data that is too long is rounded (usually)
Number: Bytes
Fixed Decimals are the exception you use the format of length of number dot followed by number of decimal places
Why enforce a field size?
Reserve memory for processing the data
Note that size changes in the select tool show up as PINK. You can use the selection tool options menu to reset the fields to the original values, that is their incoming values.
The basic filter can only test for a single condition, and the type of conditions are limited. Chances are that you will run into a use case where you need to filter on values that can’t be configured in the basic filter sooner or later.
The Filter Expression box uses Boolean logic, that is Alteryx is evaluating the expression to see if it is “True” or “False”.
Trivia note: George Boole for which Boolean was named was known for solving algebra through Aristotelian logic.
Unlike excel that hides data when we filter, Alteryx divides data into two streams:
True records pass to the ‘t’ anchor
False records pass the ‘f’ anchor
Custom Filters are filters you write yourself, using Variables and Functions. Choosing custom filter, allows access to the expression builder for more complex conditions where you can include multiple tests and/or use a large library of logic (Case logic, string functions, datetime functions).
It is a best practice to design expression so that the records you want to work with result in a True answer.
Overview the basic building blocks to building a formula
The formula tool is one of the most used tools in Alteryx workflows
In the expression box we can enter variables, functions, text (enclosed in quotes) or numbers
Output Fields:
Enter name to create new fields
Select field to modify value
Processed in order
Tabs – Formula building blocks:
Variables – Fields in same row
Functions – Actions used in expressions
Expression: Formula used to create or modify the selected Output Field
Selections in the tool can be selecting all for a type and using the UNKNOWN option. When we set unknown for the type what we are saying is if a new (in this case) Text (String) field gets introduced to the data set then apply the expression to it. Clearing the UNKNOWN box would mean that only the fields in the current data set of type string would be effected even if new fields came into the set.
This tool has a feature that the formula tool does not. It has the ability to change the data type on the fly. Unlike the formula tool where the data type of existing fields are set, this tool allows you to change the type on output.
One checkbox that always gets me, is the Copy Output Fields. This creates a new field with the changed data as the last columns in the data set. Remember to uncheck it when you are updating fields.
The variables in this tool are slightly different from the basic formula tool in that you can specify the field generically using the current field option.
Note that our expression in this example will change all string fields to UPPERCASE.
Build and example of how a multi-row formula is used
Choose to modify a field or to create a new one
Variables
Row-n: rows before the current row
Field_Name: current row
Row+n: Rows following current row
Understand what a summarize is and how it works
The Summarize tool allows you to summarize and aggregate your data in a number of methods (numerically, by concatenation, spatially)
For any column there are a range of actions available (based on field type) and that action is applied to the entire column
The most common action is Group By, which creates in the output one record for each value of that field
Action is applied to an entire column of data
Overview the cross tab tool and it is used
Pivots data vertical data onto a horizontal axis
Requires a header field and a data field
If there a multiple header names in the column, methodologies will select route of resolution
Overview the transpose tool and why it is used
It is the reverse action of a cross tab
Key fields are fields chosen to ungroup
Data fields are fields chosen to pivot into a single field
There are three pieces of a transpose
Key fields – ungrouped fields
Name – The once field headers now in a single field
Value – The data below once associated to the respective header
For text files with the same structure we can actually read in multiple files without having to use a dynamic input tool. We can read in multiple files by modifying the File Path to have Wildcard characters. Alteryx allows you to use either an asterisk or a question mark.
* allows you to find files that match the file name pattern where zero or more characters are present in the part of the name with the asterisk
? Allows you to find and load files that match the file name pattern where a single character is different between the file names
You can use any number or combination of asterisk and questions marks to build your file name pattern. However these wildcards are only allowed in the file name portion and they can only be used on text files. Rules for reading multiple files is basically the same as the dynamic input:
All data must have the same structure, that is number of columns
Must files must contain text files (excel IS NOT A TEXT FILE)
Files that don’t match the structure are skipped