Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BIG DATA サービス と ツール

1,105 views

Published on

Summary of some big data services and tools

Published in: Data & Analytics
  • Be the first to comment

BIG DATA サービス と ツール

  1. 1. BIG DATA サービスとツール
  2. 2. サービス ビッグデータ解析
  3. 3. ミックスパネル(米) ● https://mixpanel.com/segmentation/ ● http://youtu.be/nR2MzOeMoLc ● Google Analyticsより詳細にユーザ行動分析可能 ● A/Bテストやファネルドライバー分析可。 ● 簡単なコード、ミックスパネルのAPI経由により収集・分析可 能
  4. 4. Five Rocks(韓) ● https://www.5rocks.io/en/technology ● ミックスパネル的な機能を広告/ゲーム向けに強化 ● Tapjoy社による買収
  5. 5. Intimate Merger ● http://corp.intimatemerger.com/ ● インターネット上の様々なサーバーに蓄積されるビッグデー タや自社サイトのログデータなどを一元管理、分析 ● 広告配信などのアクションプランの最適化を実現
  6. 6. Flurry ● http://www.flurry.com/ ● モバイルアナリティクス:ユーザ・アプリ インターアクション解 析 ● In-app広告システム
  7. 7. Google Analytics ● http://www.google.com/analytics/features/ ● http://youtu.be/WC3ONXJn9FQ ● 分析ツール ● コンテンツの分析 ● ソーシャル解析 ● モバイル アクセス解析 ● コンバージョン解析 ● 広告の分析
  8. 8. サービス スマホアプリのクラッシュログ解析
  9. 9. Crittercism ● https://www.crittercism.com/ ● ライブラリをアプリに追加 ● 無償版/有償版有り。(フリーミアムモデル) ● 有償版は $24/月 ● クラッシュログを Unresolved, Known, Resolved の3種 類に分類 ● 同様のログをまとめる
  10. 10. Bugsense (Splunk MINT Express) ● https://mint.splunk.com/ ● ほぼcrittercismと同。 ● 有償版は $19/月 ● 無料プランは1ヶ月あたり500件までしかログを取得できな い。
  11. 11. Smartbeat ● http://smrtbeat.com/ ● http://youtu.be/P7y4gOASy80 ● マルチプラットフォーム対応 ● iOS(Objective-C), Android(Java) に加えiOS C/C++, Android NDK(C/C++) ● レイヤーでのクラッシュ、例外発生の検知・解析が可 ● ゲームエンジンUnity(C#, JS), Cocos2d-xでのエラー検知 もサポート済 ● クラッシュ直前のユーザの画面キャプチャ確認可能
  12. 12. ツール 収集 保管 検索 共有 解析 可視化
  13. 13. EC2: virtual private servers using Xen. EMR: (Elastic MapReduce): allows businesses, researchers, data analysts, and developers to easily and cheaply process vast amounts of data. It uses a hosted Hadoop framework running on the web- scale infrastructure of EC2 and Amazon S3. S3: Web based storage. Redshift: petabyte-scale data warehousing with column-based storage and multi-node compute. SimpleDB: allows developers to run queries on structured data. It operates in concert with EC2 and S3 to provide "the core functionality of a database". DynamoDB: scalable, low-latency NoSQL online Database Service backed by SSDs. RDS: scalable database server with MySQL, Oracle, SQL Server, and PostgreSQL support. http://en.wikipedia.org/wiki/Amazon_Web_Services Cloud: Amazon Web Services (AWS)
  14. 14. Cloud: Google Cloud Platform https://cloud.google.com/
  15. 15. Fluentd http://www.fluentd.org/architecture
  16. 16. http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd Fluentd + ElasticSearch + Kibana ElasticSearch: Distributed, scalable 検索エンジン Kibana: ElasticSearchの可視化UI
  17. 17. http://www.elasticsearch.org/overview/kibana
  18. 18. MongoDB http://www.mongodb.org/ ドキュメント指向データベース:使いやすい。 簡単にできる: ● レプリケーション、High Availability ● Auto-Sharding ● Map/Reduce
  19. 19. http://www.mongodb.com/use-cases/real-time-analytics
  20. 20. Cassandra http://cassandra.apache.org/ 分散、イベンチュアル・コンシステンシー、 列指向データベース MongoDBより使いにくいが、speedが速いと言われる。
  21. 21. http://www.datastax.com/wp-content/uploads/2013/02/WP- Benchmarking-Top-NoSQL-Databases.pdf
  22. 22. Hazelcast http://hazelcast.org/ In-memory, multiple-node data grid cluster Distributed Data Structures: Map, MultiMap, Queue, Set, List, Topic, Lock, AtomicLong, AtomicReference, Semaphore, CountDownLatch, IdGenerator Distributed Computing: Executor Service, Entry Processor Distributed Query: MapReduce, Aggregators Integrated Clustering: JCache, Hibernate Second Level Cache, Servlet Session Replication, Spring Integration, J2EE Transactions Client inteface: Java, C++, .NET, REST, Memcache
  23. 23. Hadoop http://hadooparchitecturetraining.blogspot. jp/2013/05/apache-hadoop-components-ecosystem.html
  24. 24. https://storm.apache.org/ ● Distributed realtime computation system. ● Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. ● Use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more Realtime: Storm
  25. 25. Batch + Realtime: Spark https://spark.apache.org/ http://www.slideshare.net/search/slideshow?q=apache+spark 速度: 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
  26. 26. Runs Everywhere: Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3.
  27. 27. 汎用性: Combine SQL, streaming, and complex analytics.
  28. 28. Machine learning 機能 (MLlib 1.1): ● linear SVM and logistic regression ● classification and regression tree ● k-means clustering ● recommendation via alternating least squares ● singular value decomposition ● linear regression with L1- and L2-regularization ● multinomial naive Bayes ● basic statistics ● feature transformations
  29. 29. Graph 機能: ● GraphX unifies ETL, exploratory analysis, and iterative graph computation within a single system. ● Seamlessly work with both graphs and collections: You can view the same data as both graphs and collections, transform and join graphs with RDDs efficiently, and write custom iterative graph algorithms using the Pregel API. ● Algorithms: PageRank, Connected components, Label propagation, SVD++, Strongly connected components, Triangle count...
  30. 30. Streaming 機能: Spark Streaming can read data from HDFS, Flume, Kafka, Twitter and ZeroMQ. You can also define your own custom data sources.
  31. 31. Cloud: Databricks https://databricks.com/product Founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark.
  32. 32. http://youtu.be/dJQ5lV5Tldw
  33. 33. 可視化ツール Web時代ですから、可視化ツールは基本的にJavaScriptで す: https://github.com/sorrycc/awesome- javascript#data-visualization
  34. 34. 例: D3 https://github.com/mbostock/d3/wiki/Gallery
  35. 35. 例: Query Builder http://mistic100.github.io/jQuery-QueryBuilder/

×