Spark & Tachyon - First in industry to offer commercial support for Spark,Tachyon. Continuously updated to latest stable vers.
Resource Manager Based on Mesos and enhanced to eliminate framework starvation issues
Spark Job Server Built & open sourced by Atigeo, a REST server for Spark, supporting Mesos & multiple contexts
Jaws: Warehouse Explorer Built & open sourced by Atigeo, HTTP server & user interface for running queries & managing metadata on top of a Shark or SparkSQL store
Data Ingestion- Visually or via REST API, define automated data ingestion processes, that ingest large datasets in parallel and provide real-time status & monitoring
Data Transform- Visually or via REST API, define automated data transf workflows that coordinate multiple distributed execution frameworks
Cluster Monitoring Active monitoring & admin UI's for the entire cluster including machine, operating system, server and service monitors
1. ‹#›
xPattern Connect 5.0
Mesos, Tachyon, Spark, Spark SJR, JAWS, Hive, Cassandra, Solr
Radu MOLDOVAN
Senior Team Lead
BUCHAREST July 2015
2. About me
• 20 years of programming (open source)
• last 3 years worked in Big Data
• Team Lead @
• building the 5th generation of xPatterns Platform
3. What is xPatterns Connect?
xPatterns is a software
platform to build intelligent, self-improving,
petabyte-scale, enterprise-grade, data driven
applications.
Connect is a pre-build big data
analytics technology that bypasses
traditional ETL & cluster configuration
23. Open Source contributions
Spark Job Server - https://github.com/Atigeo/spark-job-rest
Solves inability to run multiple Spark contexts from the same JVM
Multiple Spark contexts with distinct JVM
Job submission in Java + Scala
Jaws- http://github.com/Atigeo/http-spark-sql-server
Restful service for running Spark SQL/Shark queries on top of Spark
xPatterns API & samples
https://github.com/Atigeo/xpatterns-spark-api
https://github.com/Atigeo/xpatterns-spark-demo
The Data Import module gets data into the xPatterns platform. Data can be ingested into HDFS storage or the Tachyon in-memory distributed file system.
The Workflow module of the xPatterns platform is a workflow engine with monitoring and quality gates for big data applications.
The Warehouse Explorer module is an open source, REST service, built by Atigeo for running Spark SQL queries based on Spark with both Mesos and Tachyon support.
The Experimentation module is a Python distribution designed to empower data scientists with the ability to quickly prototype ideas, execute those ideas against large sets of data, and integrate the results into a production application while abstracting the infrastructure layer as much as possible.
The Administration module allows you to manage the modules that are exposed in the Management Console. The settings are applied to modules and components in the cluster, as well as user profiles and groups.
The Monitoring module allows you to monitor the status and performance of applications and services that are running on the xPatterns cluster as well as view historical data.
Spark & Tachyon -First in industry to offer commercial support for Spark,Tachyon. Continuously updated to latest stable vers.
Resource Manager Based on Mesos and enhanced to eliminate framework starvation issues
Spark Job Server Built & open sourced by Atigeo, a REST server for Spark, supporting Mesos & multiple contexts
Jaws: Warehouse Explorer Built & open sourced by Atigeo, HTTP server & user interface for running queries & managing metadata on top of a Shark or SparkSQL store
Data Ingestion- Visually or via REST API, define automated data ingestion processes, that ingest large datasets in parallel and provide real-time status & monitoring
Data Transform- Visually or via REST API, define automated data transf workflows that coordinate multiple distributed execution frameworks
Cluster Monitoring Active monitoring & admin UI's for the entire cluster including machine, operating system, server and service monitors
explore warehouse - browse Hive tables
concurrently and asynchronously submit SQL queries on top of Spark context
Jaws: Warehouse Explorer Built & open sourced by Atigeo, HTTP server & user interface for running queries & managing metadata on top of a Shark or SparkSQL store
The Experimentation module is a Python distribution designed to empower data scientists with the ability to quickly prototype ideas, execute those ideas against large sets of data, and integrate the results into a production application while abstracting the infrastructure layer as much as possible.