Data replication and synchronization tool

•

0 likes•965 views

This presentation discusses a data sharing synchronization tool, implemented using Infinispan. The focus is medical images and meta data such as the cancer imaging archive (TCIA)

Science

Data Replication and
Synchronization Tool
Ashish Sharma
Pradeeban Kathiravelu
PPoowweerrppooiinntt TTeemmppllaatteess 1

Introduction
• Data is huge
• Consumers often share a sub set of
data with others.
– Pointers to data, actually.
• Medical data is structured in
hierarchies.
Powerpoint Templates 2

Motivation
• Creating and sharing pointers to
interesting sub sets of data.
• Data Sharing Synchronization
System
– Fault-tolerant.
– In-Memory.
• Generic, while targeting the
medical images and meta data.
– The Cancer Imaging Archive
Powerpoint Templates 3
(TCIA)

Solution Architecture
• Users create, share, and update
replica sets from a data source.
• Infinispan In-Memory Data Grid
(version 6.0.2) to store the replica
sets.
Fig 1. Deployment Architecture
Powerpoint Templates 4

Execution Flow
• Publisher-Consumer API to consume
the replica sets and Data Provider API
to communicate with the data
source.
Powerpoint Fig 2. Execution Templates Flow
5

Design
Fig 3. Back-end Class Hierarchy
• DataProSpecs API
 createReplicaSet
 getReplicaSet
 updateReplicaSet
duplicateReplicaSet
deleteReplicaSet
getRawData
Powerpoint Templates 6

Extensibility
• Not tightly coupled to the technology.
– Other data-grids
• Hazelcast, Terracotta Big
Memory, Oracle Coherence
– Persistence
• Integration to SQL or NoSQL
solutions such as Mongo DB.
–Data sources other than TCIA.
Powerpoint Templates 7

What Infinispan offers?
• High Performance and Scalability.
• Fault-tolerance
– Multiple nodes with TCP-IP or
Multicast based JGroups clustering
configurations.
• Distributed Execution.
– Optimized for single node as a local
cache as well as a multiple-node
execution.
• MapReduce Framework.
Powerpoint Templates 8

What Infinispan offers?
• High Performance and Scalable.
• Fault-tolerant
– Multiple nodes with TCP-IP or
Multicast based JGroups clustering
configurations.
Thank you!
• Distributed Execution.
– Optimized for single node as a local
cache as well as a multiple-node
execution.
• MapReduce Framework.
Powerpoint Templates 9

Abstract:- Most real-world data science workflows require more than multiple cores on a single server to meet scale and speed demands, but there is a general lack of understanding when it comes to what machine learning on distributed systems looks like in practice. Gartner and Forrester do not consider distributed execution when they score advanced analytics software solutions. Many formal machine learning training occurs on single node machines with non-distributed algorithms. In this talk we discuss why an understanding of distributed architectures is important for anyone in the analytical sciences. We will cover the current distributed machine learning ecosystem. We will review common pitfalls when performing machine learning at scale. We will discuss architectural considerations for a machine learning program such as the role of storage and compute and under what circumstances they should be combined or separated.

Short introduction to ML frameworks on Hadoop

Yuya Takashina

This document provides a short introduction to machine learning (ML) frameworks built on Hadoop, including Hadoop, Spark, and Petuum. It notes that Hadoop is the de facto standard for distributed storage and processing of big data. Spark is 10x faster than Hadoop for some applications by caching data in memory. Petuum is even faster than Spark for ML by using asynchronous communication to reduce network costs while still guaranteeing convergence, and provides deep learning APIs.

EDW and Hadoop

Tapio Vaattanen

This document discusses the benefits and constraints of traditional enterprise data warehouses (EDW) and Hadoop frameworks. It notes that EDWs are used for reporting and analysis but require expensive ETL processes to load structured data into tables. Hadoop provides linear scalability, lower costs, and supports both SQL and non-SQL queries by keeping metadata and storage separate. The document argues that EDWs and Hadoop can coexist, with Hadoop handling ETL workloads and acting as low-cost storage, while EDWs focus on reporting and analytics using existing BI tools connected to Hadoop.

PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...

Teresa Giacomini

PostgreSQL is becoming the relational database of choice. An important factor in the rising popularity of Postgres is the extension APIs that allow developers to improve any database module’s behavior. As a result, Postgres users have access to hundreds of extensions today. In this talk, we're going to first describe extension APIs. Then, we’re going to present four popular Postgres extensions, and demo their use. * PostGIS turns Postgres into a spatial database through adding support for geographic objects. * HLL & TopN add approximation algorithms to Postgres. These algorithms are used when real-time responses matter more than exact results. * pg_partman makes managing partitions in Postgres easy. Through partitions, Postgres provide 5-10x higher performance for time-series data. * Citus transforms Postgres into a distributed database. To do this, Citus shards data, performs distributed deadlock detection, and parallelizes queries. Finally, we’ll conclude with why we think Postgres sets the way forward for relational databases. PostgreSQL is becoming the relational database of choice. One important factor in the rising popularity of Postgres are the extension APIs. These APIs allow developers to extend any database sub-module’s behavior for higher performance, security, or new functionality. As a result, Postgres users have access to over a hundred extensions today, and more to come in the future. In this talk, I’m going to first describe PostgreSQL’s extension APIs. These APIs are unique to Postgres, and have the potential to change the database landscape. Then, we’re going to present the four most popular Postgres extensions, show the use cases where they are applicable, and demo their usage. PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL. HyperLogLog (HLL) & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results. pgpartman makes creating and managing partitions in Postgres easy. Through careful partition management with pgpartman, Postgres offers 5-10x higher write and query performance for time-series data. Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries. After demoing these popular extensions, we’ll conclude with why we think the monolithic relational database is dying and how Postgres sets a path for the future. We’ll end the talk with a Q&A.

Greenplum- an opensource

Rosy Mani

Greenplum is the first open source Massively Parallel Processing (MPP) data warehouse, built with over two million lines of code. MPP allows a program to run across multiple processors that each use their own memory and operating system. Greenplum was released under Apache software and differs functionally and architecturally from other open source data systems through its use of MPP to execute complex SQL analytics over large datasets at high speeds. As an open source system, Greenplum assures customers that their software needs will be met long-term.

Building High Performance MySQL Query Systems and Analytic Applications

Calpont

Challenges in Large Scale Machine Learning

Sudarsun Santhiappan

This document discusses challenges in large scale machine learning. It begins by discussing why distributed machine learning is necessary when data is too large for one computer to store or when models have too many parameters. It then discusses various challenges that arise in distributed machine learning including scalability issues, class imbalance, the curse of dimensionality, overfitting, and algorithm complexities related to data loading times. Specific examples are provided of distributing k-means clustering and spectral clustering algorithms. Distributed implementations of support vector machines are also discussed. Throughout, it emphasizes the importance of understanding when and where distributed approaches are suitable compared to single machine learning.

There are several types of databases that can be used depending on needs and priorities. A centralized database stores all data in one location, making organization and backups easier but potentially slowing performance from high usage. Distributed databases split data across multiple locations for faster retrieval from nearby sites, though accessing distant data can be slower and ensuring consistency is important. Horizontal and vertical partitioning further divide distributed databases by specific criteria like common fields or geographic regions. Replication copies all data to multiple locations so it can be accessed locally with changes synced to the central database during off-peak times. Central indexes link to actual data stored elsewhere to reduce updates to the main database and potentially cause delays in retrieving data. Data warehouses and data mining analyze stored information.

The thinking persons guide to data warehouse design

Calpont

The document discusses key considerations for designing a data warehouse, including building a logical design, transitioning to a physical design, and monitoring and tuning the design. It recommends using a modeling tool to capture logical designs, manual partitioning in some cases, and letting database engines do the work. It also covers physical design decisions like SQL vs NoSQL, row vs column storage, partitioning, indexing and optimizing data loads. Regular monitoring of workloads, bottlenecks and ratios is advised to tune performance.

Hadoop training in bangalore

TIB Academy

MySQL conference 2010 ignite talk on InfiniDB

Calpont

InfiniDB is a column-oriented database engine that scales up across CPU cores and scales out across multiple nodes. It provides high performance for analytics, data warehousing, and read-intensive applications. Tests showed InfiniDB used less space, loaded data faster, and had significantly faster total and average query times compared to row-oriented databases. InfiniDB also showed predictable linear performance gains as data and nodes were increased.

Distributed machine learning

Stanley Wang

Machine learning can be distributed across multiple machines to allow for processing of large datasets and complex models. There are three main approaches to distributed machine learning: data parallel, where the data is partitioned across machines and models are replicated; model parallel, where different parts of large models are distributed; and graph parallel, where graphs and algorithms are partitioned. Distributed frameworks use these approaches to efficiently and scalably train machine learning models on big data in parallel.

Building next generation data warehouses

Alex Meadows

Data Warehouse Logical Design using Mysql

HAFIZ Islam

Bert Scalzo discusses optimizing data warehouse performance in MySQL. Key points include: - Designing star schemas with fact and dimension tables, indexing dimensions, and normalizing dimensions. - Tuning involves choosing storage engines like MyISAM or InnoDB, configuring MySQL, and designing indexes. - Data loading requires staging tables and recreating indexes after loads. Analyzing tables updates statistics. - Query style affects performance - simple joins are best. The explain plan and its cost indicate query efficiency. With the right design, MySQL can effectively handle large data warehouses.

3 olap storage

Claudia Gomez

OLAP tools are categorized based on how they store and process multi-dimensional data, with the main categories being MOLAP, ROLAP, HOLAP, and DOLAP. MOLAP uses specialized data structures and MDDBMS to organize and analyze aggregated data for optimal query performance. ROLAP uses relational databases with a metadata layer to facilitate multiple views of data. HOLAP combines aspects of MOLAP and ROLAP. DOLAP provides limited analysis directly from databases or via servers to desktops in the form of datacubes for local storage, analysis and maintenance.

BUILDING A DATA WAREHOUSE

Neha Kapoor

This document outlines the steps for building a data warehouse, including: 1) extracting transactional data from various sources, 2) transforming the data to relate tables and columns, 3) loading the transformed data into a dimensional database to improve query performance, 4) building pre-calculated summary values using SQL Server Analysis Services to speed up report generation, and 5) building a front-end reporting tool for end users to easily fetch required information.

Grid applications

Pooja Dixit

The document discusses different types of distributed computing including distributed supercomputing, high-throughput computing, on-demand computing, data-intensive computing, and collaborative computing. It provides examples of tasks for each type and challenges involved such as scheduling resources, scalability, and performance across heterogeneous systems. Specific examples mentioned include climate modeling, computational chemistry, parameter studies, cryptographic problems, high energy physics data analysis, and collaborative exploration of large geophysical datasets.

bigdawg overview

albertrcarter

Working with the vast variety of data out there can be a huge challenge for organizations. We believe that a “one size does not fit all” solution is required to work with such data. The BigDAWG polystore is a federated DB system for multiple, disparate data models. It supports the notions of location transparency and semantic completeness through islands of information which support a data model, query language and candidate set of DB engines. A prototype of the BigDAWG system has shown great promise when applied to diverse medical data.

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...

Michael Stack

1. Introduction to the Course "Designing Data Bases with Advanced Data Models...

Fabio Fumarola

The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data. Course Website http://pbdmng.datatoknowledge.it/

Online Analytical Processing

nayakslideshare

OLAP (Online Analytical Processing) is a technology that uses a multidimensional view of aggregated data to provide quicker access to strategic information and help with decision making. It has four main characteristics: using multidimensional data analysis techniques, providing advanced database support, offering easy-to-use end user interfaces, and supporting client/server architecture. A key aspect is representing data in a multidimensional structure that allows for consolidation and aggregation of data at different levels.

Hadoop mapreduce and yarn frame work- unit5

RojaT4

SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

Qian Lin

1) SciMATE is a novel MapReduce-like framework that supports multiple scientific data formats like NetCDF and HDF5 for in-situ data analysis without needing to reload data. 2) It provides a customizable data adapter layer and optimized data access strategies for patterns like strided, column, and discrete reads to improve performance. 3) An evaluation showed SciMATE has good thread and node scalability for data processing and loading and that contiguous column reads outperform fixed-size column reads.

Coherance in dissemination- Msis 2007

annegrete

1) Statistics Denmark provides statistical data through a centralized dissemination system called StatBank, which hosts 1,500 tables covering all subjects. StatBank allows users to access and download data in a variety of formats for free. 2) While some data inputs and management are decentralized, output is centralized through StatBank to ensure coordination of structure, formatting and simultaneous releases. 3) Principles of the dissemination system include prioritizing electronic access over paper, using StatBank as the single source of official statistics for all publications and target audiences. Metadata is also centralized and stored once for reuse across outputs.

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

Ishara Amarasekera

Hadoop tutorial for Freshers,

TIB Academy

Online analytical processing (olap) tools

kulkarnivaibhav

OLAP tools enable interactive analysis of multidimensional data from multiple perspectives. There are three main types of OLAP tools: ROLAP, MOLAP, and HOLAP. ROLAP uses relational databases and SQL queries, while MOLAP pre-computes and stores aggregated data in multidimensional arrays for fast querying. HOLAP is a hybrid that stores some data in ROLAP and some in MOLAP to optimize both query performance and cube processing time.

Visualizing big data in the browser using spark

Databricks

This document discusses using Spark to enable interactive visualization of big data in the browser. Spark can help address challenges of manipulating large datasets by caching data in memory to reduce latency, increasing parallelism, and summarizing, modeling, or sampling large datasets to reduce the number of data points. The goal is to put visualization back into the normal workflow of data analysis regardless of data size and enable sharing and collaboration through interactive and reproducible visualizations in the browser.

20160331 sa introduction to big data pipelining berlin meetup 0.3

Simon Ambridge

This document discusses building data pipelines with Apache Spark and DataStax Enterprise (DSE) for both static and real-time data. It describes how DSE provides a scalable, fault-tolerant platform for distributed data storage with Cassandra and real-time analytics with Spark. It also discusses using Kafka as a messaging queue for streaming data and processing it with Spark. The document provides examples of using notebooks, Parquet, and Akka for building pipelines to handle both large static datasets and fast, real-time streaming data sources.

What's hot

Centralised and distributed databases

Forrester High School

The thinking persons guide to data warehouse design

Calpont

Hadoop training in bangalore

TIB Academy

MySQL conference 2010 ignite talk on InfiniDB

Calpont

Distributed machine learning

Stanley Wang

Building next generation data warehouses

Alex Meadows

Data Warehouse Logical Design using Mysql

HAFIZ Islam

3 olap storage

Claudia Gomez

BUILDING A DATA WAREHOUSE

Neha Kapoor

Grid applications

Pooja Dixit

bigdawg overview

albertrcarter

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...

Michael Stack

1. Introduction to the Course "Designing Data Bases with Advanced Data Models...

Fabio Fumarola

Online Analytical Processing

nayakslideshare

Hadoop mapreduce and yarn frame work- unit5

RojaT4

SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

Qian Lin

Coherance in dissemination- Msis 2007

annegrete

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

Ishara Amarasekera

Hadoop tutorial for Freshers,

TIB Academy

Online analytical processing (olap) tools

kulkarnivaibhav

What's hot (20)

Centralised and distributed databases

The thinking persons guide to data warehouse design

Hadoop training in bangalore

MySQL conference 2010 ignite talk on InfiniDB

Distributed machine learning

Building next generation data warehouses

Data Warehouse Logical Design using Mysql

3 olap storage

BUILDING A DATA WAREHOUSE

Grid applications

bigdawg overview

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...

1. Introduction to the Course "Designing Data Bases with Advanced Data Models...

Online Analytical Processing

Hadoop mapreduce and yarn frame work- unit5

SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

Coherance in dissemination- Msis 2007

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

Hadoop tutorial for Freshers,

Online analytical processing (olap) tools

Similar to Data replication and synchronization tool

Visualizing big data in the browser using spark

Databricks

20160331 sa introduction to big data pipelining berlin meetup 0.3

Simon Ambridge

A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...

Ilkay Altintas, Ph.D.

cientific workflows are used by many scientific communities to capture, automate and standardize computational and data practices in science. Workflow-based automation is often achieved through a craft that combines people, process, computational and Big Data platforms, application-specific purpose and programmability, leading to provenance-aware archival and publications of the results. This talk summarizes varying and changing requirements for distributed workflows influenced by Big Data and heterogeneous computing architectures and present a methodology for workflow-driven science based on these maturing requirements.

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017

AWS Chicago

Transforming Data Architecture Complexity at Sears - StampedeCon 2013

StampedeCon

At the StampedeCon 2013 Big Data conference in St. Louis, Justin Sheppard discussed Transforming Data Architecture Complexity at Sears. High ETL complexity and costs, data latency and redundancy, and batch window limits are just some of the IT challenges caused by traditional data warehouses. Gain an understanding of big data tools through the use cases and technology that enables Sears to solve the problems of the traditional enterprise data warehouse approach. Learn how Sears uses Hadoop as a data hub to minimize data architecture complexity – resulting in a reduction of time to insight by 30-70% – and discover “quick wins” such as mainframe MIPS reduction.

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...

MLconf

Building a Recommender System for Publications using Vector Space Model and Python:In recent years, it has become very common that we have access to large number of publications on similar or related topics. Recommendation systems for publications are needed to locate appropriate published articles from a large number of publications on the same topic or on similar topics. In this talk, I will describe a recommender system framework for PubMed articles. PubMed is a free search engine that primarily accesses the MEDLINE database of references and abstracts on life-sciences and biomedical topics. The proposed recommender system produces two types of recommendations – i) content-based recommendation and (ii) recommendations based on similarities with other users’ search profiles. The first type of recommendation, viz., content-based recommendation, can efficiently search for material that is similar in context or topic to the input publication. The second mechanism generates recommendations using the search history of users whose search profiles match the current user. The content-based recommendation system uses a Vector Space model in ranking PubMed articles based on the similarity of content items. To implement the second recommendation mechanism, we use python libraries and frameworks. For the second method, we find the profile similarity of users, and recommend additional publications based on the history of the most similar user. In the talk I will present the background and motivation for these recommendation systems, and discuss the implementations of this PubMed recommendation system with example. This talk will cover, via live demo & code walk-through, the key lessons we’ve learned while building such real-world software systems over the past few years. We’ll incrementally build a hybrid machine learned model for fraud detection, combining features from natural language processing, topic modeling, time series analysis, link analysis, heuristic rules & anomaly detection. We’ll be looking for fraud signals in public email datasets, using Python & popular open-source libraries for data science and Apache Spark as the compute engine for scalable parallel processing.

Deep Learning on Apache® Spark™: Workflows and Best Practices

Databricks

The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark. Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including: * optimizing cluster setup; * configuring the cluster; * ingesting data; and * monitoring long-running jobs. We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters. Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.

Deep Learning on Apache® Spark™: Workflows and Best Practices

Jen Aman

This document summarizes a presentation about deep learning workflows and best practices on Apache Spark. It discusses how deep learning fits within broader data pipelines for tasks like training and transformation. It also outlines recurring patterns for integrating Spark and deep learning frameworks, including using Spark for data parallelism and embedding deep learning transforms. The presentation provides tips for developers on topics like using GPUs with PySpark and monitoring deep learning jobs. It concludes by discussing challenges in the areas of distributed deep learning and Spark integration.

Deep Learning on Apache® Spark™ : Workflows and Best Practices

Jen Aman

WorDS of Data Science in the Presence of Heterogenous Computing Architectures

Ilkay Altintas, Ph.D.

ISUM 2015 Keynote Summary: Computational and Data Science is about extracting knowledge from data and modeling. This end goal can only be achieved through a craft that combines people, processes, computational and Big Data platforms, application-specific purpose and programmability. Publications and provenance of the data products products leading to these publications are also important. With this in mind, this talk defines a terminology for computational and data science applications, and discuss why focusing on these concepts is important for executability and reproducibility in computational and data science.

Storage Systems For Scalable systems

elliando dias

Haytham ElFadeel presented on next-generation storage systems and key-value stores. He began with an overview of scalable systems and the need for both vertical and horizontal scalability. He discussed the limitations of traditional databases in scaling, including complexity, wasted features, and multi-step query processing. Key-value stores were presented as an alternative, offering simple interfaces and designs optimized for scaling across hundreds of machines. Performance comparisons showed key-value stores significantly outperforming databases. Systems discussed included Amazon Dynamo, Facebook Cassandra, and Redis.

Meta scale kognitio hadoop webinar

Michael Hiskey

This webinar discusses tools for making big data easy to work with. It covers MetaScale Expertise, which provides Hadoop expertise and case studies. Kognitio Analytics is discussed as a way to accelerate Hadoop for organizations. The webinar agenda includes an introduction, presentations on MetaScale and Kognitio, and a question and answer session. Rethinking data strategies with Hadoop and using in-memory analytics are presented as ways to gain insights from large, diverse datasets.

Parallel Distributed Deep Learning on HPCC Systems

HPCC Systems

As part of the 2018 HPCC Systems Community Day event: The training process for modern deep neural networks requires big data and large amounts of computational power. Combining HPCC Systems and Google’s TensorFlow, Robert created a parallel stochastic gradient descent algorithm to provide a basis for future deep neural network research, thereby helping to enhance the distributed neural network training capabilities of HPCC Systems. Robert Kennedy is a first year Ph.D. student in CS at Florida Atlantic University with research interests in Deep Learning and parallel and distributed computing. His current research is in improving distributed deep learning by implementing and optimizing distributed algorithms.

Data Café — A Platform For Creating Biomedical Data Lakes

Data replication and synchronization tool

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data replication and synchronization tool

Similar to Data replication and synchronization tool (20)

More from Pradeeban Kathiravelu, Ph.D.

More from Pradeeban Kathiravelu, Ph.D. (20)

Recently uploaded

Recently uploaded (20)

Data replication and synchronization tool