SlideShare a Scribd company logo
1 of 25
Download to read offline
Big Data Ready Enterprise (BDRE) | Open Source Product1
An Open Source Product
Big Data Ready
Enterprise (BDRE)
Big Data Ready Enterprise (BDRE) | Open Source Product2
Speakers
Arijit Banerjee
•  BDRE Product Architect
- Wipro Technologies
Rahul Sarda
•  Big Data Practice Head
- Wipro Technologies
Big Data Ready Enterprise (BDRE) | Open Source Product3
Agenda
How BDRE addresses the needs across the lifecycle
Fast track implementations using BDRE
Demo
BDRE In Action: Implementations Underway
Typical enterprise deployment view with BDRE
2
3
4
5
6
BDRE Roadmap7
1 Typical Big Data use cases and common challenges
Big Data Ready Enterprise (BDRE) | Open Source Product4
Typical Uses Cases & Common Challenges
When implementing and operationalizing at large scale
Jumpstart the framework implementation
Shorten the implementation cycle for applications
Reduce repetitive work across multi-step process
Robust application deployment support
Support flexible operations & SLA
Assure quality of service
Batch Data
Processing
Enterprise Data
Provisioning
Platform
Complex Pipeline
Transformation &
Semantic Processing
Data as a
Service
Event Stream &
Micro batch
Processing
Low Latency
Store
Migration of EDW
workloads
Enterprise Analytical
Platform
How to ?
Big Data Ready Enterprise (BDRE) | Open Source Product5
How BDRE addresses the needs across the lifecycle
Pluggable Architecture
Community Driven
Distribution Compatible
Implementation Jumpstart
Key Benefits
Basic Hadoop
– at the base
“Pre-built features”
BDRE
§  Operational Metadata
§  Ingestion Accelerators
§  Ad-hoc Data Movement
§  Visual Data Pipeline
§  Workflow Automation
§  One Touch Deployment
§  SLA Management
§  Rich Visualization
Implementation effort
from scratch
APPPLICATIONS
HADOOP
Reduce
implementation effort
Complements Effort
Already Spend
Big Data Ready Enterprise (BDRE) | Open Source Product6
Data Quality and Data Profiling
Operational Metadata & Lineage
Data Ingestion via Multiple Sources
One Touch Deployment
Analytics & Visualization
Job Automation & Security Integration
FastTrack Implementation using BDRE
Key things that can be rapidly implemented using the product
§  Job designer with dependency, metadata & batch
Information tagging
§  UI based Semantic & ETL Framework
§  Authentication : Integration with Kerberos & JAAS
§  DWH features: SCD2 implementation
§  Supports Hive, Pig, Map Reduce, Spark, R, Python
§  Automated central deployment and application
management.
§  Registry of all workflow processes / templates
§  Automated Workflow Generator – Oozie & others
§  Automated Process flow Planner
§  Workflow Designer
§  Support for Executing Models – R, Python, Spark
§  Zero Coding UI based configuration for common
use cases
§  User Interface based metadata interaction& search
§  Data Exploration integration with notebooks
§  Visual Representation of workflow
§  Abstraction layer: Component to ingest variety of
data (CPY, XML, DB, Mainframes)
§  Streaming Data Ingest – 16 sources with Twitter,
Flume, IOT, logs, message queue
§  File Monitoring: Component to check validity of
incoming data at file and record level
§  C2C Hive Table Migration
§  Enforce Data Quality and Data processing rules
(during ingestion or post ingestion)
§  DQ Analysis, Integrity & Failure Handling
§  Data Loading - Test Data Generation
§  Real time state of jobs & run control
§  Visual flow of the Business Workflow / Process
§  Multi & Incremental Process Pipelining
§  Visuals for Process Lineage & Auditing
§  Job Performance Analysis & Optimization
Big Data Ready Enterprise (BDRE) | Open Source Product7
Demo
Big Data Ready Enterprise (BDRE) | Open Source Product8
Key Features
Job registry
Dependency management
Batch management/tracking.
Run control
Execution status
Ingestion registry
Execution statistics logging (key/value)
Hive queries and data lineage information.
Job monitoring & alerting
HTML 5 user interface and REST APIsa
Access operational metadata
Realtime job monitoring
Dependency Pipeline
Hive data lineage
SLA Monitoring
Plugin management
Operational Metadata Management
Plugins
Teradata Queries in workflow Support
Test data manufacturer
Web Crawler
C2C Hive Table Migration
Data Quality
Tabular data from RDBMS
Streaming data from multiple sources
File ingestion by directory monitoring
Automation
One click deploy/schedule
One click app store
Job export and sharing
Custom job automation
Oozie workflow generation
Workflow designer
Batch lineage
Generalized UI Data Ingestion Framework
Workflow Automation
Big Data Ready Enterprise (BDRE) | Open Source Product9
BDRE In Action: Implementations Underway
For leading global bank reduce the
implementation cycle by 40+% for
linking millions of transactions daily
across client groups and it lines of
business covering more than 65
countries
For the largest consumer electronics
proof of concept for redistribution of the
data across multiple clusters with
enablement of incremental workloads.
For large retailer managing SLA's,
automated scheduling for its 1000+ jobs
across multiple line of business
reducing the effort by 50% and enabled
automated deployment
Improved regulatory reporting
cycle by 40%
Enabled annual savings of
$2.5mn for EDW optimization
Support faster data migration
across distribution
For a large bank in UK we are enabling
real time workloads using new age
technologies enabling several reusable
templates for data transformations,
profiling and real time ingests
covering 9000+ data sources
For large energy giant support the
movement of predictive models on
parasitic event detection in advance.
For a large retailer enabling migration
of on-premise EDW workloads for more
than 1200+ complex entities by moving
the workload on to Amazon EMR &
Redshift
Estimated savings of $3mn for
real time streaming Platform
Cloud enablement
with projected savings of $2mn
Data exploration and Predictive
modeling
Big Data Ready Enterprise (BDRE) | Open Source Product10
Typical enterprise deployment view with BDRE
Eventing
Framework
Espresso Email
SLA notification
Proactive Reporting
Job
Job
Job
Job
Job
Hadoop Cluster
Data Quality
Workflow
Non
Hadoop
Workflows
Ingestion
Workflow
Semantic
Workflow
Bulk Data
generation
Workflow
APP Store
(Git Repo)
NN RM
Oozie Work
flow Generator
Job Deploy
Scripts
Job Export/
Import
Edge Node
BDRE UI App
App Server
JAAS
BDRE Rest
API
Browser
Operational
Metadata
RDBMS
Metastore
Rule Engine(for
DQ)
Big Data Ready Enterprise (BDRE) | Open Source Product11
Features in pipeline
Enhancing Foundation Optimizing Core Creating Value
§  Support for execution of
predictive models in
Spark ML
§  Secured & Robust Data
Governance
§  Real Time Data
Transformation
workflows
§  Graphical Query builder
§  Rich plugin library
§  Self Service BI
§  Data Quality Rule
Builder
§  Plugins for common
unstructured data
processing
§  PDF
§  Images
§  Videos
§  Centralize monitoring
across platforms
Big Data Ready Enterprise (BDRE) | Open Source Product12
Get Involved
Project Site
§  http://wiproopensourcepractice.github.io/openbdre/
Source Code
§  https://github.com/wiproopensourcepractice/openbdre
Community Dashboard
§  https://github.com/wiproopensourcepractice/openbdre
BDRE Core Team
Arijit Banerjee
Sri Harsha Boda
Kapil Paliwal
Mishi Vidya Sinku
Sudam Madhav
Prem Kumar
Rahul Sarda
THANK YOU
Big Data Ready Enterprise (BDRE) | Open Source Product14
Additional Slides
Big Data Ready Enterprise (BDRE) | Open Source Product15
BDRE Screenshots
Big Data Ready Enterprise (BDRE) | Open Source Product16
Batch Batch
Metadata driven Operational Management
BDRE provides out of the box run control, SLA management and workflow generation
UI
Metadata
API Layer
Batch & Real Time
data provisioning with
profiling & data quality
checks
Workflow
Workflow1
Action Action
Action Action Action
Action Action Action Action
Action
Workflow1
Action Action
Action Action Action
Action Action Action Action
Action
Workflow1
Action Action
Action Action Action
Action Action Action Action
Action
Semantic
Processing &
Transformation
Pipeline
Enqueues batches on
completion
Queue
Batch
Batch
Batch
Dedicated batch
queue
Enqueues batches
on completion
Enqueues batches
on completion
Workflow1
Action Action
Action Action Action
Action Action Action Action
Dedicated
batch queue
Data
exploration
& Predictive
modeling
Dedicated
batch queue
Queue
Batc
h
Batc
h
Batc
h
Queue
Process 1 Process 2
Process 3 & 4
Process 1 Process 2
Process 3
Process 4
Logical pipeline between the processes
Batch
Action
Big Data Ready Enterprise (BDRE) | Open Source Product17
BDRE Metadata Management system
§ Source_batch_id BIGINT
§ target _batch_id BIGINT
§ queue_id BIGINT
§ insert_ts TIMESTAMP
§ source_process_id INT
§ start_process_id INT
§ end_ts TIMESTAMP
§ batch_state INT
§ batch_marketing INT
§ Batch_marketing VARCHAR (45)
§ process_id INT
BATCH_CONSUMP_QUEUE
§ batch_state_ INT
§ Description VARCHAR (45)
BATCH_STATUS
§ source_batch_id BIGINT
§ target_batch_id BIGINT
§ queue_id BIGINT
§ source_process_id INT
§ insert_ts TIMESTAMP
§ start_ts TIMESTAMP
§ end_ts TIMESTAMP
§ batch_state INT
§ batch_marketing VARCHAR (45)
§ process_id INT
ARCHIVE_CONSUMP_QUEUE
§ instance_exec_id BIGINT
§ process_id INT
§ start_ts TIMESTAMP
§ End_ts TIMESTAMP
§ exec_state INT
INSTANCE_EXEC
§ batch_id BIGINT
§ sever_id INT
§ Path VARCHAR (45)
§ File_size BIGINT
§ File_hasg VARCHAR (45
§ creation_ts TIMESTAMP
FILE
§ server_id INT
§ server_type VARCHAR (45)
§ server_name VARCHAR (45)
§ server_metainfo VARCHAR (…)
§ login_user VARCHAR (45)
§ ssh_private_key VARCHAR (…)
§ server_ip VARCHAR (45)
SERVERS
§ etl_process_id INT
§ raw_table_id INT
§ base_table_id INT
§ insert_type SMALLINT
§ drop_raw BOOLEAN
§ raw_view_id INT
ETL_DRIVER
§ batch_id BIGINT
§ source_process_run_id BIGINT
§ batch_type VARCHAR (45)
BATCH
§ process_id INT
§ config_group VARCHAR (10)
§ key VARCHAR (45)
§ value VARCHAR (2048)
§ description VARCHAR (1028)
PROPERTIES
§ process_id INT
§ description VARCHAR (256)
§ add_ts TIMESTAMP
§ process_name VARCHAR (45)
§ bus_domain_id INT
§ process_type_id INT
§ parent_process_id INT
§ can_recover BOOLEAN
§ enqueuing_process_id INT
§ batch_cut_pattern VARCHAR
(45)
§ next_process_id VARCHAR (456)
PROCESS
§ bus_domain_id INT
§ decription VARCHAR (256)
§ bus_domain_name VARCHAR
(45)
§ bus_domain_owner VARCHAR
(45)
BUS_DOMAIN
§ Exec_state_id INT
§ Description VARCHAR (45)
EXEC_STATUS
§ process_type_id INT
§ process_type_name VARCHAR
(45)
PROCESS_TYPE
§ table_id INT
§ comments VARCHAR (256)
§ location_type VARCHAR (45)
§ dbname VARCHAR (45)
§ batch_id_partition_col VARCHAR
(45)
§ table_name VARCHAR (45)
§ type VARCHAR (45)
§ ddl VARCHAR (2048)
HIVE_TABLES
Big Data Ready Enterprise (BDRE) | Open Source Product18
Automated Oozie Workflow Generation
Process id Parent id Next Steps Enqueuer Id
100 null 101,102 null
101 100 103 null
102 100 103 Null
103 100 100 null
Fork
InitJob 100
103
HaltStep 101
HaltStep 102
InitStep 103
HaltStep 103
HaltJob 100
Join
101 102
InitStep 101 InitStep 101
Big Data Ready Enterprise (BDRE) | Open Source Product19
Intra and Inter Process Dependency
Process 401
Process 402
Process 400
Process 101
Process 102
Process 103
Process 203
Process 204
Process 205
Process 202
Process 201
Process 100
Process 200
Process 301
Process 302
Process 303
Process 304
Process 300
Pid Enq id Parent id
300 Null Null
301 100 300
302 Null 300
303 Null 300
304 200 300
Big Data Ready Enterprise (BDRE) | Open Source Product20
Partition Based on Run Execution
Insert overwrite
Base View
Select * from base where
runid=compact;
File Load job run# 4
File Load job run# 3
File Load job run# 2
File Load job run# 1
History View
Select * from base where runid !=
compact;
This partition is over written on every
file load. This contains compacted data
(one single latest records for each key).
The value of ‘compact’ is basically MAX
value of BIGINT datatype
In this diagram one single table contains
both complete history and current records
in two different partitions. If full history not
needed the runid partitions are auto-
dropped after compaction.
Base Table
Runid=1
Runid=2
Runid=3
Runid=4
Runid=Compact
These runid partitions are created after
each file load. The runid is automatically
supplied by BDRE run control system.
Compaction Job
Runs automatically after File Load
job. It operates on the last compacted
partition and the latest run partition
and rewrites into the compacted
partition
Big Data Ready Enterprise (BDRE) | Open Source Product21
Job Status Management
InitJob
HaltJob
(success)
TermJob
(failure)
InitStep
HaltStep
(Success)
TermStep
(Failure)
BDRE
Operational
Metadata
Fail queue
Success queue
Consumer
JIRA
MQ
§  Halt and TermJob APIs can send message to MQ for proactive alerting
§  Alternatively BDRE could directly connect to any alerting/ticket mgmt system
skipping the MQ
Big Data Ready Enterprise (BDRE) | Open Source Product22
Replicating Similar Jobs with add-on Process
Data Ingestion Process 1
Semantic Process 2
Analytics Process 3
Properties For 1
Properties For 2
Properties For 3
Properties TemplateCore Process Template
CREATE NEW REPLICA/PUSH CHANGES
Big Data Ready Enterprise (BDRE) | Open Source Product23
Hive Partition Pruning
Populated by process X
Insert overwrite T2
Partition (country ,state ,runid )
Select coi1, col2 , $
{target_batched}, country_col,
State_col, ${instanceexecid} from T1
Where batched between ($ {min_batch} , $
{max_barch_;
Propcess X: Hive Query running from Oozie
BDRE
Operational
Metadata
initjob
target_batched=201
instanceexecid=3
min_src_batch=3
max_src_batch=5
Partition based on
batchid
batchid=1
batchid=2
batchid=3
batchid=4
batchid=4
Base Hive
Table T1
Y : Upstream of X
First business
partition
Second Business
partition
Partition on
runid
Country=us
State=ma
State=ri
State=ca
State=tx
Country=uk State=en
Runid=1
Runid=1
Runid=1
State-=sk
Runid=1
Runid=1
Runid=1
Semantic Hive Table T2
Already processed
in earlier runs
Unprocessed batches
from the queue
Big Data Ready Enterprise (BDRE) | Open Source Product24
Rule
definition
Rule engine UI
Data Quality Component
Map only
MR job
Mapper 1 Mapper 2 Mapper n
Rules
Bad records Good records
HadoopOriginal file
with all records
Guvnor API
Big Data Ready Enterprise (BDRE) | Open Source Product25
300
301
302 303
304
Batch Management
101
102
103
201
202
203
204
205
400
401 402
Batch
Batch
Batch
Batch
Queue
Batch
Batch
Queue
Batch
Logical pipeline between the processes
Process 200
Process 300
Process 100
Process 400
Workflow id
400
Batch
A row is added to the queue table for all
downstream upon each successful
execution of an upstream process.
Downstream looks up the queue and
process all pending batches en-queued by
upstream.
Multiple source batch
consumed = one target batch
is produced
Workflow 300
100
200
Workflow id 200
Workflow id 100

More Related Content

What's hot

Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 

What's hot (20)

Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 

Similar to Big Data Ready Enterprise

InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSNicolas Georgeault
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Business Intelligence Best Practice Summit: BI Quo Vadis
Business Intelligence Best Practice Summit:  BI Quo VadisBusiness Intelligence Best Practice Summit:  BI Quo Vadis
Business Intelligence Best Practice Summit: BI Quo VadisManagility
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...Doina Draganescu
 

Similar to Big Data Ready Enterprise (20)

InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Big data ready Enterprise
Big data ready EnterpriseBig data ready Enterprise
Big data ready Enterprise
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Business Intelligence Best Practice Summit: BI Quo Vadis
Business Intelligence Best Practice Summit:  BI Quo VadisBusiness Intelligence Best Practice Summit:  BI Quo Vadis
Business Intelligence Best Practice Summit: BI Quo Vadis
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Big Data Ready Enterprise

  • 1. Big Data Ready Enterprise (BDRE) | Open Source Product1 An Open Source Product Big Data Ready Enterprise (BDRE)
  • 2. Big Data Ready Enterprise (BDRE) | Open Source Product2 Speakers Arijit Banerjee •  BDRE Product Architect - Wipro Technologies Rahul Sarda •  Big Data Practice Head - Wipro Technologies
  • 3. Big Data Ready Enterprise (BDRE) | Open Source Product3 Agenda How BDRE addresses the needs across the lifecycle Fast track implementations using BDRE Demo BDRE In Action: Implementations Underway Typical enterprise deployment view with BDRE 2 3 4 5 6 BDRE Roadmap7 1 Typical Big Data use cases and common challenges
  • 4. Big Data Ready Enterprise (BDRE) | Open Source Product4 Typical Uses Cases & Common Challenges When implementing and operationalizing at large scale Jumpstart the framework implementation Shorten the implementation cycle for applications Reduce repetitive work across multi-step process Robust application deployment support Support flexible operations & SLA Assure quality of service Batch Data Processing Enterprise Data Provisioning Platform Complex Pipeline Transformation & Semantic Processing Data as a Service Event Stream & Micro batch Processing Low Latency Store Migration of EDW workloads Enterprise Analytical Platform How to ?
  • 5. Big Data Ready Enterprise (BDRE) | Open Source Product5 How BDRE addresses the needs across the lifecycle Pluggable Architecture Community Driven Distribution Compatible Implementation Jumpstart Key Benefits Basic Hadoop – at the base “Pre-built features” BDRE §  Operational Metadata §  Ingestion Accelerators §  Ad-hoc Data Movement §  Visual Data Pipeline §  Workflow Automation §  One Touch Deployment §  SLA Management §  Rich Visualization Implementation effort from scratch APPPLICATIONS HADOOP Reduce implementation effort Complements Effort Already Spend
  • 6. Big Data Ready Enterprise (BDRE) | Open Source Product6 Data Quality and Data Profiling Operational Metadata & Lineage Data Ingestion via Multiple Sources One Touch Deployment Analytics & Visualization Job Automation & Security Integration FastTrack Implementation using BDRE Key things that can be rapidly implemented using the product §  Job designer with dependency, metadata & batch Information tagging §  UI based Semantic & ETL Framework §  Authentication : Integration with Kerberos & JAAS §  DWH features: SCD2 implementation §  Supports Hive, Pig, Map Reduce, Spark, R, Python §  Automated central deployment and application management. §  Registry of all workflow processes / templates §  Automated Workflow Generator – Oozie & others §  Automated Process flow Planner §  Workflow Designer §  Support for Executing Models – R, Python, Spark §  Zero Coding UI based configuration for common use cases §  User Interface based metadata interaction& search §  Data Exploration integration with notebooks §  Visual Representation of workflow §  Abstraction layer: Component to ingest variety of data (CPY, XML, DB, Mainframes) §  Streaming Data Ingest – 16 sources with Twitter, Flume, IOT, logs, message queue §  File Monitoring: Component to check validity of incoming data at file and record level §  C2C Hive Table Migration §  Enforce Data Quality and Data processing rules (during ingestion or post ingestion) §  DQ Analysis, Integrity & Failure Handling §  Data Loading - Test Data Generation §  Real time state of jobs & run control §  Visual flow of the Business Workflow / Process §  Multi & Incremental Process Pipelining §  Visuals for Process Lineage & Auditing §  Job Performance Analysis & Optimization
  • 7. Big Data Ready Enterprise (BDRE) | Open Source Product7 Demo
  • 8. Big Data Ready Enterprise (BDRE) | Open Source Product8 Key Features Job registry Dependency management Batch management/tracking. Run control Execution status Ingestion registry Execution statistics logging (key/value) Hive queries and data lineage information. Job monitoring & alerting HTML 5 user interface and REST APIsa Access operational metadata Realtime job monitoring Dependency Pipeline Hive data lineage SLA Monitoring Plugin management Operational Metadata Management Plugins Teradata Queries in workflow Support Test data manufacturer Web Crawler C2C Hive Table Migration Data Quality Tabular data from RDBMS Streaming data from multiple sources File ingestion by directory monitoring Automation One click deploy/schedule One click app store Job export and sharing Custom job automation Oozie workflow generation Workflow designer Batch lineage Generalized UI Data Ingestion Framework Workflow Automation
  • 9. Big Data Ready Enterprise (BDRE) | Open Source Product9 BDRE In Action: Implementations Underway For leading global bank reduce the implementation cycle by 40+% for linking millions of transactions daily across client groups and it lines of business covering more than 65 countries For the largest consumer electronics proof of concept for redistribution of the data across multiple clusters with enablement of incremental workloads. For large retailer managing SLA's, automated scheduling for its 1000+ jobs across multiple line of business reducing the effort by 50% and enabled automated deployment Improved regulatory reporting cycle by 40% Enabled annual savings of $2.5mn for EDW optimization Support faster data migration across distribution For a large bank in UK we are enabling real time workloads using new age technologies enabling several reusable templates for data transformations, profiling and real time ingests covering 9000+ data sources For large energy giant support the movement of predictive models on parasitic event detection in advance. For a large retailer enabling migration of on-premise EDW workloads for more than 1200+ complex entities by moving the workload on to Amazon EMR & Redshift Estimated savings of $3mn for real time streaming Platform Cloud enablement with projected savings of $2mn Data exploration and Predictive modeling
  • 10. Big Data Ready Enterprise (BDRE) | Open Source Product10 Typical enterprise deployment view with BDRE Eventing Framework Espresso Email SLA notification Proactive Reporting Job Job Job Job Job Hadoop Cluster Data Quality Workflow Non Hadoop Workflows Ingestion Workflow Semantic Workflow Bulk Data generation Workflow APP Store (Git Repo) NN RM Oozie Work flow Generator Job Deploy Scripts Job Export/ Import Edge Node BDRE UI App App Server JAAS BDRE Rest API Browser Operational Metadata RDBMS Metastore Rule Engine(for DQ)
  • 11. Big Data Ready Enterprise (BDRE) | Open Source Product11 Features in pipeline Enhancing Foundation Optimizing Core Creating Value §  Support for execution of predictive models in Spark ML §  Secured & Robust Data Governance §  Real Time Data Transformation workflows §  Graphical Query builder §  Rich plugin library §  Self Service BI §  Data Quality Rule Builder §  Plugins for common unstructured data processing §  PDF §  Images §  Videos §  Centralize monitoring across platforms
  • 12. Big Data Ready Enterprise (BDRE) | Open Source Product12 Get Involved Project Site §  http://wiproopensourcepractice.github.io/openbdre/ Source Code §  https://github.com/wiproopensourcepractice/openbdre Community Dashboard §  https://github.com/wiproopensourcepractice/openbdre
  • 13. BDRE Core Team Arijit Banerjee Sri Harsha Boda Kapil Paliwal Mishi Vidya Sinku Sudam Madhav Prem Kumar Rahul Sarda THANK YOU
  • 14. Big Data Ready Enterprise (BDRE) | Open Source Product14 Additional Slides
  • 15. Big Data Ready Enterprise (BDRE) | Open Source Product15 BDRE Screenshots
  • 16. Big Data Ready Enterprise (BDRE) | Open Source Product16 Batch Batch Metadata driven Operational Management BDRE provides out of the box run control, SLA management and workflow generation UI Metadata API Layer Batch & Real Time data provisioning with profiling & data quality checks Workflow Workflow1 Action Action Action Action Action Action Action Action Action Action Workflow1 Action Action Action Action Action Action Action Action Action Action Workflow1 Action Action Action Action Action Action Action Action Action Action Semantic Processing & Transformation Pipeline Enqueues batches on completion Queue Batch Batch Batch Dedicated batch queue Enqueues batches on completion Enqueues batches on completion Workflow1 Action Action Action Action Action Action Action Action Action Dedicated batch queue Data exploration & Predictive modeling Dedicated batch queue Queue Batc h Batc h Batc h Queue Process 1 Process 2 Process 3 & 4 Process 1 Process 2 Process 3 Process 4 Logical pipeline between the processes Batch Action
  • 17. Big Data Ready Enterprise (BDRE) | Open Source Product17 BDRE Metadata Management system § Source_batch_id BIGINT § target _batch_id BIGINT § queue_id BIGINT § insert_ts TIMESTAMP § source_process_id INT § start_process_id INT § end_ts TIMESTAMP § batch_state INT § batch_marketing INT § Batch_marketing VARCHAR (45) § process_id INT BATCH_CONSUMP_QUEUE § batch_state_ INT § Description VARCHAR (45) BATCH_STATUS § source_batch_id BIGINT § target_batch_id BIGINT § queue_id BIGINT § source_process_id INT § insert_ts TIMESTAMP § start_ts TIMESTAMP § end_ts TIMESTAMP § batch_state INT § batch_marketing VARCHAR (45) § process_id INT ARCHIVE_CONSUMP_QUEUE § instance_exec_id BIGINT § process_id INT § start_ts TIMESTAMP § End_ts TIMESTAMP § exec_state INT INSTANCE_EXEC § batch_id BIGINT § sever_id INT § Path VARCHAR (45) § File_size BIGINT § File_hasg VARCHAR (45 § creation_ts TIMESTAMP FILE § server_id INT § server_type VARCHAR (45) § server_name VARCHAR (45) § server_metainfo VARCHAR (…) § login_user VARCHAR (45) § ssh_private_key VARCHAR (…) § server_ip VARCHAR (45) SERVERS § etl_process_id INT § raw_table_id INT § base_table_id INT § insert_type SMALLINT § drop_raw BOOLEAN § raw_view_id INT ETL_DRIVER § batch_id BIGINT § source_process_run_id BIGINT § batch_type VARCHAR (45) BATCH § process_id INT § config_group VARCHAR (10) § key VARCHAR (45) § value VARCHAR (2048) § description VARCHAR (1028) PROPERTIES § process_id INT § description VARCHAR (256) § add_ts TIMESTAMP § process_name VARCHAR (45) § bus_domain_id INT § process_type_id INT § parent_process_id INT § can_recover BOOLEAN § enqueuing_process_id INT § batch_cut_pattern VARCHAR (45) § next_process_id VARCHAR (456) PROCESS § bus_domain_id INT § decription VARCHAR (256) § bus_domain_name VARCHAR (45) § bus_domain_owner VARCHAR (45) BUS_DOMAIN § Exec_state_id INT § Description VARCHAR (45) EXEC_STATUS § process_type_id INT § process_type_name VARCHAR (45) PROCESS_TYPE § table_id INT § comments VARCHAR (256) § location_type VARCHAR (45) § dbname VARCHAR (45) § batch_id_partition_col VARCHAR (45) § table_name VARCHAR (45) § type VARCHAR (45) § ddl VARCHAR (2048) HIVE_TABLES
  • 18. Big Data Ready Enterprise (BDRE) | Open Source Product18 Automated Oozie Workflow Generation Process id Parent id Next Steps Enqueuer Id 100 null 101,102 null 101 100 103 null 102 100 103 Null 103 100 100 null Fork InitJob 100 103 HaltStep 101 HaltStep 102 InitStep 103 HaltStep 103 HaltJob 100 Join 101 102 InitStep 101 InitStep 101
  • 19. Big Data Ready Enterprise (BDRE) | Open Source Product19 Intra and Inter Process Dependency Process 401 Process 402 Process 400 Process 101 Process 102 Process 103 Process 203 Process 204 Process 205 Process 202 Process 201 Process 100 Process 200 Process 301 Process 302 Process 303 Process 304 Process 300 Pid Enq id Parent id 300 Null Null 301 100 300 302 Null 300 303 Null 300 304 200 300
  • 20. Big Data Ready Enterprise (BDRE) | Open Source Product20 Partition Based on Run Execution Insert overwrite Base View Select * from base where runid=compact; File Load job run# 4 File Load job run# 3 File Load job run# 2 File Load job run# 1 History View Select * from base where runid != compact; This partition is over written on every file load. This contains compacted data (one single latest records for each key). The value of ‘compact’ is basically MAX value of BIGINT datatype In this diagram one single table contains both complete history and current records in two different partitions. If full history not needed the runid partitions are auto- dropped after compaction. Base Table Runid=1 Runid=2 Runid=3 Runid=4 Runid=Compact These runid partitions are created after each file load. The runid is automatically supplied by BDRE run control system. Compaction Job Runs automatically after File Load job. It operates on the last compacted partition and the latest run partition and rewrites into the compacted partition
  • 21. Big Data Ready Enterprise (BDRE) | Open Source Product21 Job Status Management InitJob HaltJob (success) TermJob (failure) InitStep HaltStep (Success) TermStep (Failure) BDRE Operational Metadata Fail queue Success queue Consumer JIRA MQ §  Halt and TermJob APIs can send message to MQ for proactive alerting §  Alternatively BDRE could directly connect to any alerting/ticket mgmt system skipping the MQ
  • 22. Big Data Ready Enterprise (BDRE) | Open Source Product22 Replicating Similar Jobs with add-on Process Data Ingestion Process 1 Semantic Process 2 Analytics Process 3 Properties For 1 Properties For 2 Properties For 3 Properties TemplateCore Process Template CREATE NEW REPLICA/PUSH CHANGES
  • 23. Big Data Ready Enterprise (BDRE) | Open Source Product23 Hive Partition Pruning Populated by process X Insert overwrite T2 Partition (country ,state ,runid ) Select coi1, col2 , $ {target_batched}, country_col, State_col, ${instanceexecid} from T1 Where batched between ($ {min_batch} , $ {max_barch_; Propcess X: Hive Query running from Oozie BDRE Operational Metadata initjob target_batched=201 instanceexecid=3 min_src_batch=3 max_src_batch=5 Partition based on batchid batchid=1 batchid=2 batchid=3 batchid=4 batchid=4 Base Hive Table T1 Y : Upstream of X First business partition Second Business partition Partition on runid Country=us State=ma State=ri State=ca State=tx Country=uk State=en Runid=1 Runid=1 Runid=1 State-=sk Runid=1 Runid=1 Runid=1 Semantic Hive Table T2 Already processed in earlier runs Unprocessed batches from the queue
  • 24. Big Data Ready Enterprise (BDRE) | Open Source Product24 Rule definition Rule engine UI Data Quality Component Map only MR job Mapper 1 Mapper 2 Mapper n Rules Bad records Good records HadoopOriginal file with all records Guvnor API
  • 25. Big Data Ready Enterprise (BDRE) | Open Source Product25 300 301 302 303 304 Batch Management 101 102 103 201 202 203 204 205 400 401 402 Batch Batch Batch Batch Queue Batch Batch Queue Batch Logical pipeline between the processes Process 200 Process 300 Process 100 Process 400 Workflow id 400 Batch A row is added to the queue table for all downstream upon each successful execution of an upstream process. Downstream looks up the queue and process all pending batches en-queued by upstream. Multiple source batch consumed = one target batch is produced Workflow 300 100 200 Workflow id 200 Workflow id 100