Chicago Data Summit: Apache HBase: An Introduction

•

124 likes•22,565 views

Apache HBase is an open source distributed data-store capable of managing billions of rows of semi-structured data across large clusters of commodity hardware. HBase provides real-time random read-write access as well as integration with Hadoop MapReduce, Hive, and Pig for batch analysis. In this talk, Todd will provide an introduction to the capabilities and characteristics of HBase, comparing and contrasting it with traditional database systems. He will also introduce its architecture and data model, and present some example use cases.

Technology

Apache HBase: an introduction ,[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],Introductions

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]

Apache HBase HBase is an open source , distributed , sorted map datastore modeled after Google’s BigTable

Open Source ,[object Object],[object Object],[object Object]

Distributed ,[object Object],[object Object],[object Object]

Sorted Map Datastore ,[object Object],[object Object],[object Object],[object Object]

Sorted Map Datastore (logical view as “records”) A single cell might have different values at different timestamps Different rows may have different sets of columns(table is sparse ) Useful for *-To-Many mappings Different types of data separated into different “ column families” Implicit PRIMARY KEY in RDBMS terms Data is all byte[] in HBase Row key Data cutting info: { ‘height’: ‘9ft’, ‘state’: ‘CA’ } roles: { ‘ASF’: ‘Director’, ‘Hadoop’: ‘Founder’ } tlipcon info: { ‘height’: ‘5ft7, ‘state’: ‘CA’ } roles: { ‘Hadoop’: ‘Committer’@ts=2010, ‘ Hadoop’: ‘PMC’@ts=2011, ‘ Hive’: ‘Contributor’ }

Sorted Map Datastore (physical view as “cells”) Sorted on disk by Row key, Col key, descending timestamp Milliseconds since unix epoch info Column Family roles Column Family Row key Column key Timestamp Cell value cutting roles:ASF 1273871823022 Director cutting roles:Hadoop 1183746289103 Founder tlipcon roles:Hadoop 1300062064923 PMC tlipcon roles:Hadoop 1293388212294 Committer tlipcon roles:Hive 1273616297446 Contributor Row key Column key Timestamp Cell value cutting info:height 1273516197868 9ft cutting info:state 1043871824184 CA tlipcon info:height 1273878447049 5ft7 tlipcon info:state 1273616297446 CA

Column Families ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Accessing HBase ,[object Object],[object Object],[object Object],[object Object]

HBase API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

High Level Architecture HBase HDFS ZooKeeper Java Client MapReduce Hive/Pig Thrift/REST Gateway Your Java Application

Terms and Daemons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Cluster Architecture RegionServer HDFS HMaster RegionServer RegionServer … HMaster ZK Peer ZK Peer ZK Peer ZK Quorum Client Client finds RegionServer addresses in ZooKeeper Client reads and writes rows by directly accessing the RegionServers Master assigns regions and achieves load balancing

Cluster Deployment (big cluster) HDFS NameNode Secondary NameNode MapReduce JobTracker ZooKeeper ZooKeeper ZooKeeper HMaster HMaster RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker 3 or 5 nodes ZK HMaster with one standby 40+ slaves with HBase, HDFS, and MR slave processes

Cluster Deployment (small cluster / POC) NameNode SecondaryNameNode HMaster JobTracker ZooKeeper RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker RegionServer DataNode TaskTracker 5+ slaves with HBase, HDFS, and MR slave processes The proverbial basket full of eggs

HBase vs just HDFS If you have neither random write nor random read, stick to HDFS! Plain HDFS/MR HBase Write pattern Append-only Random write, bulk incremental Read pattern Full table scan, partition table scan Random read, small range scan, or table scan Hive (SQL) performance Very good 4-5x slower Structured storage Do-it-yourself / TSV / SequenceFile / Avro / ? Sparse column-family data model Max data size 30+ PB ~1PB

HBase vs RDBMS RDBMS HBase Data layout Row-oriented Column-family-oriented Transactions Multi-row ACID Single row only Query language SQL get/put/scan/etc * Security Authentication/Authorization Work in progress Indexes On arbitrary columns Row-key only Max data size TBs ~1PB Read/write throughput limits 1000s queries/second Millions of queries/second

HBase vs other “NoSQL” ,[object Object],[object Object],[object Object],[object Object],[object Object]

HBase in Numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

SaaS Audit Logging ,[object Object],[object Object],[object Object],[object Object]

Facebook Analytics ,[object Object],[object Object],[object Object],[object Object],[object Object],http://tiny.cloudera.com/hbase-fb-analytics

OpenTSDB ,[object Object],[object Object],[object Object],[object Object],http://opentsdb.net

Use HBase if… ,[object Object],[object Object],[object Object]

Don’t use HBase if… ,[object Object],[object Object],[object Object]

Resources ,[object Object],[object Object],[object Object],[object Object],[object Object]

Questions? ,[object Object],[object Object],[object Object]

Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.

Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera

Cloudera, Inc.

"While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns. "

HBase in Practice

larsgeorge

From: DataWorks Summit 2017 - Munich - 20170406 HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.

Intro to HBase

alexbaranau

Apache HBase Performance Tuning

Lars Hofhansl

Introduction to Storm Chandler Huang

Hoodie - DataEngConf 2017

Vinoth Chandar

Introduction to HBase - NoSqlNow2015

Apekshit Sharma

Apache Sqoop efficiently transfers bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. Sqoop can also be used to extract data from Hadoop and export it into external structured datastores. Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB

HBase Schema Design - HBase-Con 2012

Ian Varley

Apache Spark Architecture

Alexey Grishchenko

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev

Altinity Ltd

Apache HBase™

Prashant Gupta

Apache sqoop with an use case

Davin Abraham

Building robust CDC pipeline with Apache Hudi and Debezium

Tathastu.ai

We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.

Dev Ops Training

Spark Summit

Introduction to RedisDvir Volk

Securing Hadoop with Apache Ranger

DataWorks Summit

Facebook Messages & HBase

强王

Hudi architecture, fundamentals and capabilities

Nishith Agarwal

Building Your Data Warehouse with Amazon Redshift

Amazon Web Services

In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.

The Parquet Format and Performance Optimization Opportunities

Databricks

The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads. As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general. This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.

The Impala Cookbook

Cloudera, Inc.

Cassandra Introduction & Features

DataStax Academy

HBase Advanced - Lars George

JAX London

Introduction to HBase

Avkash Chauhan

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Cloudera, Inc.

OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.

Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...

Edureka!

** Hadoop Training: https://www.edureka.co/hadoop ** This Edureka PPT on Sqoop Tutorial will explain you the fundamentals of Apache Sqoop. It will also give you a brief idea on Sqoop Architecture. In the end, it will showcase a demo of data transfer between Mysql and Hadoop Below topics are covered in this video: 1. Problems with RDBMS 2. Need for Apache Sqoop 3. Introduction to Sqoop 4. Apache Sqoop Architecture 5. Sqoop Commands 6. Demo to transfer data between Mysql and Hadoop Check our complete Hadoop playlist here: https://goo.gl/hzUO0m Follow us to never miss an update in the future. Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

Apache Hadoop and HBase

Cloudera, Inc.

Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.

What's hot

Sqoop

Prashant Gupta

HBase Schema Design - HBase-Con 2012

Ian Varley

Apache Spark Architecture

Alexey Grishchenko

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev

Altinity Ltd

Apache HBase™

Prashant Gupta

Apache sqoop with an use case

Davin Abraham

Building robust CDC pipeline with Apache Hudi and Debezium

Tathastu.ai

Dev Ops Training

Spark Summit

Introduction to RedisDvir Volk

Securing Hadoop with Apache Ranger

DataWorks Summit

Facebook Messages & HBase

强王

Hudi architecture, fundamentals and capabilities

Nishith Agarwal

Building Your Data Warehouse with Amazon Redshift

Amazon Web Services

The Parquet Format and Performance Optimization Opportunities

Databricks

The Impala Cookbook

Cloudera, Inc.

Cassandra Introduction & Features

DataStax Academy

HBase Advanced - Lars George

JAX London

Introduction to HBase

Avkash Chauhan

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Cloudera, Inc.

Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...

Edureka!

What's hot (20)

Sqoop

HBase Schema Design - HBase-Con 2012

Apache Spark Architecture

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev

Apache HBase™

Apache sqoop with an use case

Building robust CDC pipeline with Apache Hudi and Debezium

Dev Ops Training

Introduction to Redis

Securing Hadoop with Apache Ranger

Facebook Messages & HBase

Hudi architecture, fundamentals and capabilities

Building Your Data Warehouse with Amazon Redshift

The Parquet Format and Performance Optimization Opportunities

The Impala Cookbook

Cassandra Introduction & Features

HBase Advanced - Lars George

Introduction to HBase

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...

Viewers also liked

Apache Hadoop and HBase

Cloudera, Inc.

Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.

HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers

Cloudera, Inc.

Apache HBase for Architects

Nick Dimiduk

Keynote: Apache HBase at Yahoo! Scale

HBaseCon

Yahoo has long been involved in HBase and its community. In 2013, HBase was offered as a hosted service at Yahoo. Since then, adoption has grown rapidly., and today, HBase is used by numerous teams across the company, helping to enable a diverse set of use cases ranging from near real-time processing to data warehousing. This was made possible thanks to HBase along with some enhancements to support multi-tenancy and scale. As our clusters continue to grow and use cases become more demanding we are working towards supporting a million regions in a single cluster. In this keynote, we’ll paint a picture of where Yahoo! is today and the enhancements we have been working on to reach today’s scale as well as supporting a million regions and beyond.

HBase for Architects

Nick Dimiduk

HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

Edureka!

NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases: Traditional databases Challenges with traditional databases CAP Theorem NoSQL to the rescue A BASE system Choose the right NoSQL database

Hourglass: a Library for Incremental Processing on Hadoop

Matthew Hayes

Slides from my talk at IEEE BigData 2013 presenting our paper "Hourglass: a Library for Incremental Processing on Hadoop" Abstract: Hadoop enables processing of large data sets through its relatively easy-to-use semantics. However, jobs are often written inefficiently for tasks that could be computed incrementally due to the burdensome incremental state management for the programmer. This paper introduces Hourglass, a library for developing incremental monoid computations on Hadoop. It runs on unmodified Hadoop and provides an accumulator-based interface for programmers to store and use state across successive runs; the framework ensures that only the necessary subcomputations are performed. It is successfully used at LinkedIn, one of the largest online social networks, for many use cases in dashboarding and machine learning. Hourglass is open source and freely available.

HBase杂谈

Joseph Pan

Hourglass: a Library for Incremental Processing on Hadoop

Matthew Hayes

Hadoop enables processing of large data sets through its relatively easy-to-use semantics. However, jobs are often written inefficiently for tasks that could be computed incrementally due to the burdensome incremental state management for the programmer. This paper introduces Hourglass, a library for developing incremental monoid computations on Hadoop. It runs on unmodified Hadoop and provides an accumulator-based interface for programmers to store and use state across successive runs; the framework ensures that only the necessary subcomputations are performed. It is successfully used at LinkedIn, one of the largest online social networks, for many use cases in dashboarding and machine learning. Hourglass is open source and freely available.

Sept 17 2013 - THUG - HBase a Technical Introduction

Adam Muise

Apache HBase 1.0 Release

Nick Dimiduk

Apache HBase - Introduction & Use Cases

Data Con LA

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable This talk will introduce to Apache HBase and will give you an overview of Columnar databases. We will also talk about how Facebook is using HBase currently. We will talk about HBase security, Apache Phoenix and Apache Slider

Introduction To HBase

Anil Gupta

Apache Mesos at Twitter (Texas LinuxFest 2014)

Chris Aniszczyk

HBase: Just the Basics

HBaseCon

Speaker: Jesse Anderson (Cloudera) As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.

Intro to HBase Internals & Schema Design (for HBase users)

alexbaranau

HBase Read High Availability Using Timeline Consistent Region Replicas

enissoz

How To Get More From SlideShare - Super-Simple Tips For Content Marketing

Content Marketing Institute

Hbase hivepig

Radha Krishna

Viewers also liked (20)

Apache Hadoop and HBase

Hw09 Practical HBase Getting The Most From Your H Base Install

HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers

Apache HBase for Architects

Keynote: Apache HBase at Yahoo! Scale

HBase for Architects

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

Hourglass: a Library for Incremental Processing on Hadoop

HBase杂谈

Hourglass: a Library for Incremental Processing on Hadoop

Sept 17 2013 - THUG - HBase a Technical Introduction

Apache HBase 1.0 Release

Apache HBase - Introduction & Use Cases

Introduction To HBase

Apache Mesos at Twitter (Texas LinuxFest 2014)

HBase: Just the Basics

Intro to HBase Internals & Schema Design (for HBase users)

HBase Read High Availability Using Timeline Consistent Region Replicas

How To Get More From SlideShare - Super-Simple Tips For Content Marketing

Hbase hivepig

Similar to Chicago Data Summit: Apache HBase: An Introduction

Introduction to HBaseByeongweon Moon

Hadoop_arunam_pptjerrin joseph

Nextag talkJoydeep Sen Sarma

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...

IndicThreads

Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India WEB: http://J10.IndicThreads.com ------------ Hbase is an open-source, non-relational, distributed, sparse, column-oriented data-store modeled after Google’s BigTable and is written in Java. In this presentation we will talk about how to migrate a RDBMS based Java application to Hbase based application. We will have a discussion on following points: • Hbase schema design (a paradigm shift from the way we think about data-storage right now) compared to RDBMS based schema design. • The challenges faced while porting the application with HBase. • Introduction to HBql to query the data from Hbase. • Monitoring example application for Hbase (JMX APIs exposed) and Machine’s performance with Gangila. • Discussion on Thrift interface and how can we used Rest interface to integrate hbase with non java based applications. • Cluster replication and what is coming in the next major 0.90 release of Hbase. • We will end up the session, with the demo of ported application. Takeaways for the Audience 1. When is Hbase appropriate and when not? 2. Hbase architecture and schema design 3. RDBMS vs Hbase 4. Interfacing Hbase with applications using Thrift or REST 5. Hbase cluster and Replication 6. Hbase monitoring

Apache hadoop hbase

sheetal sharma

Hbase

AmitkumarPal21

HBASE OverviewSampath Rachakonda

מיכאל

sqlserver.co.il

Big Data and New Challenges for DBAs (Michael Naumov, LivePerson) Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover: What is Hadoop and why do I care? What do people do with Hadoop? How can SQL Server DBAs add Hadoop to their architecture?

Impala for PhillyDB Meetup

Shravan (Sean) Pabba

Escalando Aplicaciones Web

Santiago Coffey

Facebook keynote-nicolas-qconYiwei Ma

支撑Facebook消息处理的h base存储系统yongboy

HbaseShashwat Shriparv

Hypertable Distilled by edydkim.github.com

Edward D. Kim

HBase.pptx

Sadhik7

عصر کلان داده، چرا و چگونه؟

datastack

Sf NoSQL MeetUp: Apache Hadoop and HBase

Cloudera, Inc.

The ABC of Big Data

André Faria Gomes

Big data conceptsSerkan Özal

HBase introduction talk

Hayden Marchant

This is the introductory presentation on HBase given by Hayden Marchant in the monthly Amobee Tech Talk. In this session, we'll learn about HBase, a NoSQL database that provides real-time, random read and write access to tables meant to store billions of rows and millions of columns. HBase is an open-source, non-relational distributed column-oriented database, is linearly scalable, and is designed to run on commodity hardware. HBase clusters can be in the hundreds and thousands of nodes, serving extraordinary amounts of information. Tight integration with Hadoop gives way to allows powerful analytical processing on data residing in HBase.

Similar to Chicago Data Summit: Apache HBase: An Introduction (20)

Introduction to HBase

Hadoop_arunam_ppt

Nextag talk

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...

Apache hadoop hbase

Hbase

HBASE Overview

מיכאל

Impala for PhillyDB Meetup

Escalando Aplicaciones Web

Facebook keynote-nicolas-qcon

支撑Facebook消息处理的h base存储系统

Hbase

Hypertable Distilled by edydkim.github.com

HBase.pptx

عصر کلان داده، چرا و چگونه؟

Sf NoSQL MeetUp: Apache Hadoop and HBase

The ABC of Big Data

Big data concepts

HBase introduction talk

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx

Cloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists

Cloudera, Inc.

This annual program recognizes organizations who are moving swiftly towards the future and building innovative solutions by making what was impossible yesterday, possible today. The winning organizations' implementations demonstrate outstanding achievements in fulfilling their mission, technical advancement, and overall impact. The 2021 Data Impact Awards recognize organizations' achievements with the Cloudera Data Platform in seven categories: Data Lifecycle Connection Data for Enterprise AI Cloud Innovation Security & Governance Leadership People First Data for Good Industry Transformation

2020 Cloudera Data Impact Awards Finalists

Cloudera, Inc.

Cloudera is proud to present the 2020 Data Impact Awards Finalists. This annual program recognizes organizations running the Cloudera platform for the applications they've built and the impact their data projects have on their organizations, their industries, and the world. Nominations were evaluated by a panel of independent thought-leaders and expert industry analysts, who then selected the finalists and winners. Winners exemplify the most-cutting edge data projects and represent innovation and leadership in their respective industries.

Edc event vienna presentation 1 oct 2019

Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19

Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19

Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19

Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18

Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3

Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2

Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1

Cloudera, Inc.

Extending Cloudera SDX beyond the Platform

Cloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18

Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360

Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18

Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18

Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Recently uploaded

DevOps and Testing slides at DASA Connect

Kari Kakkonen

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Knowledge engineering: from people to machines and back

Elena Simperl

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Recently uploaded (20)