Smart data integration to hybrid data analysis infrastructureDataWorks Summit
To improve customer value and corporate competitiveness, it is necessary to deal with advanced analysis using big data, including data of core systems, and digital transformation.
At the same time, examples of hybrid construction of on-premise clouds are also spreading.
In this session, we will introduce the technology and the latest case examples of applying real-time replication utilized in the backbone system (RDBMS) to the Hadoop data analysis infrastructure (Hadoop Data Lake) of the hybrid configuration.
Big Data Visual Analytics Realized By Hadoop and TableauDataWorks Summit
Tableau makes it easy for business users to find valuable insights quickly and easily from Hadoop's huge dataset. Even without the advanced knowledge of the query language, a lean visual analysis interface makes big data manageable for more people. Please mix with the demo and see.
About NEC Corporation:
Without waiting for digitization, how can we master the latest technology and find new knowledge from all kinds of data? I would like to think about experience and latest products based on how we can establish it as a continuous effort and enhance global competitiveness.
NTT Communications' Initiatives to Utilize Infrastructure DataDataWorks Summit
We will guide you on the status of utilization of the big data analysis infrastructure for infrastructure data, centering on initiatives at the Next Generation Platform Promotion Office.
Smart data integration to hybrid data analysis infrastructureDataWorks Summit
To improve customer value and corporate competitiveness, it is necessary to deal with advanced analysis using big data, including data of core systems, and digital transformation.
At the same time, examples of hybrid construction of on-premise clouds are also spreading.
In this session, we will introduce the technology and the latest case examples of applying real-time replication utilized in the backbone system (RDBMS) to the Hadoop data analysis infrastructure (Hadoop Data Lake) of the hybrid configuration.
Big Data Visual Analytics Realized By Hadoop and TableauDataWorks Summit
Tableau makes it easy for business users to find valuable insights quickly and easily from Hadoop's huge dataset. Even without the advanced knowledge of the query language, a lean visual analysis interface makes big data manageable for more people. Please mix with the demo and see.
About NEC Corporation:
Without waiting for digitization, how can we master the latest technology and find new knowledge from all kinds of data? I would like to think about experience and latest products based on how we can establish it as a continuous effort and enhance global competitiveness.
NTT Communications' Initiatives to Utilize Infrastructure DataDataWorks Summit
We will guide you on the status of utilization of the big data analysis infrastructure for infrastructure data, centering on initiatives at the Next Generation Platform Promotion Office.
Beginner must-see! A future that can be opened by learning HadoopDataWorks Summit
What is "Hadoop" now? It is difficult to hear ... But those who are interested, those who are thinking about the future as active as a data engineer, those who are new to the first time, through introductions of Hadoop and the surrounding ecosystem, introducing merits and examples, "What now Should I learn? "And I will introduce the future spreading through learning Hadoop and the surrounding ecosystem.
The way to a smart factory armed with data utilizationDataWorks Summit
In this presentation, we will look at the appearance of smart factory that should come through introduction of plant conservation integration solution provided by our company. This solution is composed of workers at the manufacturing site and various applications that contribute to the improvement of the safety and efficiency of facilities and processes, and outlines the data utilization, project promotion, platform architecture etc. which are essential to it.
From data catalog to preparation. The latest data platform to accelerate the ...DataWorks Summit
New data governance is necessary in the big data era.
We will introduce big data comprehensive solution equipped with AI engine "CLAIRE", such as data cataloging across the enterprise, data preparation to support self-service analysis, development execution environment that can easily utilize Hadoop engine.
Red Hat Forum 2014 のIBMセッション資料です。
「ビッグデータの即時活用を実現するJava高速処理OpenStackプラットフォーム」
http://redhatforum2014.jp/
https://redhatmktg.smktg.jp/public/session/view/18
The document discusses Marketo's efforts to rebuild its web tracking infrastructure to greatly increase its capabilities and scale. The legacy system could only handle 2 million activities per day and had problems with delays and lack of flexibility. The new Orion initiative aims to support billions of daily activities with near real-time processing using a distributed architecture with Apache Spark, Kafka, and HBase on Hadoop. The initial results included supporting a key customer's increase from 2 million to over 20 million activities per day with latencies under 30 seconds.
This document discusses using Apache Spark and Amazon DSSTNE to generate product recommendations at scale. It summarizes that Amazon uses Spark and Zeppelin notebooks to allow data scientists to develop queries in an agile manner. Deep learning jobs are run on GPUs using Amazon ECS, while CPU jobs run on Amazon EMR. DSSTNE is optimized for large sparse neural networks and allows defining networks in a human-readable JSON format to efficiently handle Amazon's large recommendation problems.
Beginner must-see! A future that can be opened by learning HadoopDataWorks Summit
What is "Hadoop" now? It is difficult to hear ... But those who are interested, those who are thinking about the future as active as a data engineer, those who are new to the first time, through introductions of Hadoop and the surrounding ecosystem, introducing merits and examples, "What now Should I learn? "And I will introduce the future spreading through learning Hadoop and the surrounding ecosystem.
The way to a smart factory armed with data utilizationDataWorks Summit
In this presentation, we will look at the appearance of smart factory that should come through introduction of plant conservation integration solution provided by our company. This solution is composed of workers at the manufacturing site and various applications that contribute to the improvement of the safety and efficiency of facilities and processes, and outlines the data utilization, project promotion, platform architecture etc. which are essential to it.
From data catalog to preparation. The latest data platform to accelerate the ...DataWorks Summit
New data governance is necessary in the big data era.
We will introduce big data comprehensive solution equipped with AI engine "CLAIRE", such as data cataloging across the enterprise, data preparation to support self-service analysis, development execution environment that can easily utilize Hadoop engine.
Red Hat Forum 2014 のIBMセッション資料です。
「ビッグデータの即時活用を実現するJava高速処理OpenStackプラットフォーム」
http://redhatforum2014.jp/
https://redhatmktg.smktg.jp/public/session/view/18
The document discusses Marketo's efforts to rebuild its web tracking infrastructure to greatly increase its capabilities and scale. The legacy system could only handle 2 million activities per day and had problems with delays and lack of flexibility. The new Orion initiative aims to support billions of daily activities with near real-time processing using a distributed architecture with Apache Spark, Kafka, and HBase on Hadoop. The initial results included supporting a key customer's increase from 2 million to over 20 million activities per day with latencies under 30 seconds.
This document discusses using Apache Spark and Amazon DSSTNE to generate product recommendations at scale. It summarizes that Amazon uses Spark and Zeppelin notebooks to allow data scientists to develop queries in an agile manner. Deep learning jobs are run on GPUs using Amazon ECS, while CPU jobs run on Amazon EMR. DSSTNE is optimized for large sparse neural networks and allows defining networks in a human-readable JSON format to efficiently handle Amazon's large recommendation problems.
This document provides an overview and crash course on Apache Spark and related big data technologies. It discusses the history and components of Spark including Spark Core, SQL, Streaming, and MLlib. It also discusses data sources, challenges of big data, and how Spark addresses them through its in-memory computation model. Finally, it introduces Apache Zeppelin for interactive notebooks and the Hortonworks Data Platform sandbox for experimenting with these technologies.
This document summarizes Coca-Cola East Japan's journey with Hadoop and data analytics. It discusses:
1) Coca-Cola East Japan's background and data landscape prior to Hadoop, which involved data silos and batch-oriented processing.
2) The phases of Coca-Cola East Japan's Hadoop implementation from a pilot project in 2015 to a production environment in 2016 with 13 nodes storing 20TB of data.
3) Examples of Hadoop projects including vending machine replenishment forecasting and a write-off reporting project to aggregate data from multiple sources.
4) Future plans to improve data collection and establish a true data lake, and develop data-driven decision
This document compares the performance of Hive and Spark SQL for processing smart meter data from electric utilities. Spark SQL showed significantly better performance than Hive, being able to process a year's worth of data from 10 million smart meters within 30 minutes. The use of ORCFile format and partitioning the data by individual equipment such as transformers improved performance. Spark SQL was well-suited for this near real-time use case due to its distributed in-memory computations.
The document discusses the future of data and modern data applications. It notes that data is growing exponentially and will reach 44 zettabytes by 2020. This growth is driving the need for new data architectures like Apache Hadoop which can handle diverse data types from sources like the internet of things. Hadoop provides distributed storage and processing to enable real-time insights from all available data.
This document provides an overview of Apache NiFi and data flow fundamentals. It begins with an introduction to Apache NiFi and outlines the agenda. It then discusses data flow and streaming fundamentals, including challenges in moving data effectively. The document introduces Apache NiFi's architecture and capabilities for addressing these challenges. It also previews a live demo of NiFi and discusses the NiFi community.
Yahoo Japan transitioned their Hadoop cluster network architecture over time to address problems and scale needs. They moved from a stack architecture to an L2 fabric to an IP CLOS architecture. The IP CLOS architecture improved scalability, high availability, and reduced operating costs by allowing over 10,000 nodes with 100-200Gbps uplinks per rack and an oversubscription ratio of 1.25:1. This solved problems around switch failures, BUM traffic loads, decommissioning limitations, and scale-out limits they previously faced.
The document discusses how Apache Ambari can be used to streamline Hadoop DevOps. It describes how Ambari can be used to provision, manage, and monitor Hadoop clusters. It highlights new features in Ambari 2.4 like support for additional services, role-based access control, management packs, and Grafana integration. It also covers how Ambari supports automated deployment and cluster management using blueprints.
This document discusses ways to troubleshoot slow Hadoop jobs using metrics, logging, and tracing. It describes how to use the Ambari metrics system and Grafana dashboards to monitor metrics for clusters. It also explains how to leverage Hadoop logs and the YARN Application Timeline Service for logging and correlation across workloads. Finally, it presents Apache Zeppelin and analyzers for Hive, Tez, and YARN as tools for ad-hoc analysis to diagnose issues.
This document proposes a container-based sizing framework for Apache Hadoop/Spark clusters that uses a multi-objective genetic algorithm approach. It emulates container execution on different cloud platforms to optimize configuration parameters for minimizing execution time and deployment cost. The framework uses Docker containers with resource constraints to model cluster performance on various public clouds and instance types. Optimization finds Pareto-optimal configurations balancing time and cost across objectives.
Major advancements in Apache Hive towards full support of SQL compliance include:
1) Adding support for SQL2011 keywords and reserved keywords to reduce parser ambiguity issues.
2) Adding support for primary keys and foreign keys to improve query optimization, specifically cardinality estimation for joins.
3) Implementing set operations like INTERSECT and EXCEPT by rewriting them using techniques like grouping, aggregation, and user-defined table functions.
The NameNode was experiencing high load and instability after being restarted. Graphs showed unknown high load between checkpoints on the NameNode. DataNode logs showed repeated 60000 millisecond timeouts in communication with the NameNode. Thread dumps revealed NameNode server handlers waiting on the same lock, indicating a bottleneck. Source code analysis pointed to repeated block reports from DataNodes to the NameNode as the likely cause of the high load.
This document summarizes a presentation about new features in Apache Hadoop 3.0 related to YARN and MapReduce. It discusses major evolutions like the re-architecture of the YARN Timeline Service (ATS) to address scalability, usability, and reliability limitations. Other evolutions mentioned include improved support for long-running native services in YARN, simplified REST APIs, service discovery via DNS, scheduling enhancements, and making YARN more cloud-friendly with features like dynamic resource configuration and container resizing. The presentation estimates the timeline for Apache Hadoop 3.0 releases with alpha, beta, and general availability targeted throughout 2017.
The document discusses evolving HDFS to better support large scale deployments. It summarizes HDFS's strengths in scaling to large clusters and data sizes. However, scaling the large number of small files and blocks is challenging. The solution involves using partial namespaces to store only recently used metadata in memory, and block containers to group blocks together. This will generalize the storage layer to support different container types beyond HDFS blocks. Initial goals are to scale to billions of files and blocks per volume, with the ability to add more volumes for further scaling. The changes will also enable new use cases like block storage and caching data in cloud storage.
This document discusses running Apache Spark and Apache Zeppelin in production. It begins by introducing the author and their background. It then covers security best practices for Spark deployments, including authentication using Kerberos, authorization using Ranger/Sentry, encryption, and audit logging. Different Spark deployment modes like Spark on YARN are explained. The document also discusses optimizing Spark performance by tuning executor size and multi-tenancy. Finally, it covers security features for Apache Zeppelin like authentication, authorization, and credential management.
This document discusses Spark security and provides an overview of authentication, authorization, encryption, and auditing in Spark. It describes how Spark leverages Kerberos for authentication and uses services like Ranger and Sentry for authorization. It also outlines how communication channels in Spark are encrypted and some common issues to watch out for related to Spark security.
The document discusses the Virtual Data Connector project which aims to leverage Apache Atlas and Apache Ranger to provide unified metadata and access governance across data sources. Key points include:
- The project aims to address challenges of understanding, governing, and controlling access to distributed data through a centralized metadata catalog and policies.
- Apache Atlas provides a scalable metadata repository while Apache Ranger enables centralized access governance. The project will integrate these using a virtualization layer.
- Enhancements to Atlas and Ranger are proposed to better support the project's goals around a unified open metadata platform and metadata-driven governance.
- An initial minimum viable product will be built this year with the goal of an open, collaborative ecosystem around shared
This document discusses using a data science platform to enable digital diagnostics in healthcare. It provides an overview of healthcare data sources and Yale/YNHH's data science platform. It then describes the data science journey process using a clinical laboratory use case as an example. The goal is to use big data and machine learning to improve diagnostic reproducibility, throughput, turnaround time, and accuracy for laboratory testing by developing a machine learning algorithm and real-time data processing pipeline.
This document discusses using Apache Spark and MLlib for text mining on big data. It outlines common text mining applications, describes how Spark and MLlib enable scalable machine learning on large datasets, and provides examples of text mining workflows and pipelines that can be built with Spark MLlib algorithms and components like tokenization, feature extraction, and modeling. It also discusses customizing ML pipelines and the Zeppelin notebook platform for collaborative data science work.
This document compares the performance of Hive and Spark when running the BigBench benchmark. It outlines the structure and use cases of the BigBench benchmark, which aims to cover common Big Data analytical properties. It then describes sequential performance tests of Hive+Tez and Spark on queries from the benchmark using a HDInsight PaaS cluster, finding variations in performance between the systems. Concurrency tests are also run by executing multiple query streams in parallel to analyze throughput.
The document discusses modern data applications and architectures. It introduces Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. Hadoop provides massive scalability and easy data access for applications. The document outlines the key components of Hadoop, including its distributed storage, processing framework, and ecosystem of tools for data access, management, analytics and more. It argues that Hadoop enables organizations to innovate with all types and sources of data at lower costs.
This document provides an overview of data science and machine learning. It discusses what data science and machine learning are, including extracting insights from data and computers learning without being explicitly programmed. It also covers Apache Spark, which is an open source framework for large-scale data processing. Finally, it discusses common machine learning algorithms like regression, classification, clustering, and dimensionality reduction.
This document provides an overview of Apache Spark, including its capabilities and components. Spark is an open-source cluster computing framework that allows distributed processing of large datasets across clusters of machines. It supports various data processing workloads including streaming, SQL, machine learning and graph analytics. The document discusses Spark's APIs like DataFrames and its libraries like Spark SQL, Spark Streaming, MLlib and GraphX. It also provides examples of using Spark for tasks like linear regression modeling.
This document provides an overview of Apache NiFi and dataflow. It begins with an introduction to the challenges of moving data effectively within and between systems. It then discusses Apache NiFi's key features for addressing these challenges, including guaranteed delivery, data buffering, prioritized queuing, and data provenance. The document outlines NiFi's architecture and components like repositories and extension points. It also previews a live demo and invites attendees to further discuss Apache NiFi at a Birds of a Feather session.
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
QE automation for large systems is a great step forward in increasing system reliability. In the big-data world, multiple components have to come together to provide end-users with business outcomes. This means, that QE Automations scenarios need to be detailed around actual use cases, cross-cutting components. The system tests potentially generate large amounts of data on a recurring basis, verifying which is a tedious job. Given the multiple levels of indirection, the false positives of actual defects are higher, and are generally wasteful.
At Hortonworks, we’ve designed and implemented Automated Log Analysis System - Mool, using Statistical Data Science and ML. Currently the work in progress has a batch data pipeline with a following ensemble ML pipeline which feeds into the recommendation engine. The system identifies the root cause of test failures, by correlating the failing test cases, with current and historical error records, to identify root cause of errors across multiple components. The system works in unsupervised mode with no perfect model/stable builds/source-code version to refer to. In addition the system provides limited recommendations to file/open past tickets and compares run-profiles with past runs.
Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
There has been an explosion of data digitising our physical world – from cameras, environmental sensors and embedded devices, right down to the phones in our pockets. Which means that, now, companies have new ways to transform their businesses – both operationally, and through their products and services – by leveraging this data and applying fresh analytical techniques to make sense of it. But are they ready? The answer is “no” in most cases.
In this session, we’ll be discussing the challenges facing companies trying to embrace the Analytics of Things, and how Teradata has helped customers work through and turn those challenges to their advantage.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
In high-risk manufacturing industries, regulatory bodies stipulate continuous monitoring and documentation of critical product attributes and process parameters. On the other hand, sensor data coming from production processes can be used to gain deeper insights into optimization potentials. By establishing a central production data lake based on Hadoop and using Talend Data Fabric as a basis for a unified architecture, the German pharmaceutical company HERMES Arzneimittel was able to cater to compliance requirements as well as unlock new business opportunities, enabling use cases like predictive maintenance, predictive quality assurance or open world analytics. Learn how the Talend Data Fabric enabled HERMES Arzneimittel to become data-driven and transform Big Data projects from challenging, hard to maintain hand-coding jobs to repeatable, future-proof integration designs.
Talend Data Fabric combines Talend products into a common set of powerful, easy-to-use tools for any integration style: real-time or batch, big data or master data management, on-premises or in the cloud.
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
2. 会社紹介
社名:株式会社セゾン情報システムズ
設立年月日:1970年9月1日 ■代表者:代表取締役社長 内田 和弘
資本金:13億6768万7500円 ■上場市場:東京証券取引所JASDAQスタンダード市場
事業内容:HULFTシリーズ製品事業、カードシステム事業、流通・ITソリューション事業
The top share of
Managed File Transfer Tool
No.1 customer satisfication
Data Integration Tool
fil
e
fil
e
fil
e
fil
e
fil
e
fil
e
fil
e
All Rights Reserved SAISON INFORMATION SYSTEMS CO.,LTD.
2
22. 22
Amazon Web Servicesで
センサーデータリミックスハッカソン
1. オトナの体力測定/センサーで運動量を検知してグラフにして表示
2. お部屋Watch α/部屋に誰もいないのに電気がついていることがないか確認できる仕組み
3. Hotspot Finder/センサーを使って、人がたくさんいる場所を地図上に可視化
4. GuzzREACH/赤ちゃんのぐずりをセンサーを使って検知。メールで通知してくれるサービス
5. IoN/コンサートやライブの観客のノリ(グルーブ感)を可視化
2015年1月17日(土)開催
東京都目黒区
イベントの開催報告が掲載されています http://dstn.appresso.com/blogdetail?id=4971
All Rights Reserved SAISON INFORMATION SYSTEMS CO.,LTD.
25. Microsoft Azureアダプタ
対応サービス
HDInsight Machine Learning
Azure SQL Blob
Storage
DocumentDB
Queue
Storage
Service
Bus
共同開発:
25
All Rights Reserved SAISON INFORMATION SYSTEMS CO.,LTD.
27. Microsoft Azure
Microsoft Azure
デモイメージ
Microsoft Azure
ユーザデータを
BLOBへ格納
Microsoft Azure
結果を出力
Log
BLOBに格納された
データを使用し、
MachineLearning
で機械学習を実行
結果を取り込む
複数のレストランを経営。ユーザの会員情報をもとにレストランをレコメンドする。
取り込んだユーザ会員情報をHD Insightで解析。MachineLearningでデータ解析を行い
レコメンド情報を出力、表示。
Hiveジョブを実行し、
取り込んだデータの成
形を行う
Microsoft Azure
データ
27
All Rights Reserved SAISON INFORMATION SYSTEMS CO.,LTD.