Pachyderm is a big data analytics platform deployed with Kubernetes and Docker. Pachyderm is inspired by the Hadoop ecosystem but shares no code with it. Instead, we leverage the container ecosystem to provide the broad functionality of Hadoop with the ease of use of Docker.
3. What is Pachyderm?
Big data with Containers
• Version control for data
• Uses containers for data processing
• Batched and streaming
• Data lives in object storage (S3, GCS, Ceph)
• Shares no code with Hadoop
4. Intro to Containers
• What are containers and why are they
useful for Big Data Applications?
• What is Kubernetes and why is it useful
for Big Data Application?