Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadedaCloudera Japan
Data Engineering and Data Analysis Workshop #1 での有賀 (@chezou)の発表です。
https://cyberagent.connpass.com/event/58808/
Cloudera Data Science WorkbenchとPySparkを使い、Pythonで好きなライブラリを分散実行する方法についてです。日本語の形態素解析ライブラリMeCabをPySparkから実行します。
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadedaCloudera Japan
Data Engineering and Data Analysis Workshop #1 での有賀 (@chezou)の発表です。
https://cyberagent.connpass.com/event/58808/
Cloudera Data Science WorkbenchとPySparkを使い、Pythonで好きなライブラリを分散実行する方法についてです。日本語の形態素解析ライブラリMeCabをPySparkから実行します。
This document discusses how to make software more green and environmentally friendly. It defines green software as software that is carbon efficient, energy efficient, hardware efficient, and carbon aware. It provides recommendations for various roles within an organization on driving green initiatives, including focusing on efficiency for CxOs, architects, infrastructure engineers, and developers. Examples include optimizing resource usage, using public clouds effectively, prioritizing equipment standardization, and developing applications that can run more efficiently.
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
The document proposes a knowledge-driven query expansion approach for question answering (QA)-based product attribute extraction. It trains QA models using attribute-value pairs from training data as knowledge, while mimicking imperfect knowledge at test time through techniques like knowledge dropout and token mixing. This helps induce better query representations, especially for rare and ambiguous attributes. Experiments on a cleaned product attribute dataset show the proposed approach with all techniques outperforms baseline methods in both macro and micro F1 scores.
This document summarizes Andrew Hajinikitas' work developing Rakuten's private cloud infrastructure. It describes the key components of Rakuten's infrastructure including metal instances, microservers, and GPU servers. It provides details on Rakuten's software stack and their goals to expand managed services. Currently, Rakuten operates 9 data centers in Japan and overseas providing around 30,000 servers to support their ecosystem. Their future plans include extending network self-service, making GPU resources available as a platform service, and improving efficiency through optimized hardware selection.
The document discusses the Travel & Leisure Platform Dept and its responsibilities related to data and platform management. It provides an overview of the technical stack including private/public clouds, databases, containers, and automation/monitoring tools. It then discusses recent projects involving business continuity, containerization, alert integration, and automation. Finally, it describes open roles for a DBA and DevOps position and their responsibilities related to database provisioning, backup/recovery, infrastructure as code, and providing platforms and tools for developers.
This presentation introduces the OWASP Top 10:2021.
It explains how to look at the data related to OWASP Top 10:2021, and provides detailed explanations of items with distinctive data. It also introduces the OWASP Project related to each item.
Gora API Group technology provides a microservices architecture and APIs for Rakuten's golf course reservation system, improving the user experience and increasing customer loyalty and annual golf rounds. The architecture migrates the monolithic reservation system to microservices using Kotlin, Spring Boot, and other technologies, exposing APIs for the frontend and new products while sustaining the legacy system through services, queues, continuous delivery, and operations monitoring.
13. 検索での関連語提示や辞書構築での活用
検索での関連語提示や辞書構築での活用
での関連語提示 での
クラスターから検索解析用のHiveにつなげ
関連語の提示や辞書構築等での活用
月 250GBのデータを解析
suggest batch
server
Suggest
Index
sync analyzed update search index
data
Shared Hadoop
Cluster NGS Hive dictionary batch 検索エンジン
Server
NGS common
Dictionary
platform for hive Index
update search index
13