SlideShare a Scribd company logo
1CONFIDENTIAL
Real complex infrastructure
solution for Hadoop Big data and
AWS with Cloudera CDH 5.x
May, 2017
2CONFIDENTIAL
CLIENT
• Epam is responsible for system engineering of
enterprise data lake. based on the Hadoop
technology stack.
• Epam is responsible for ETL implementation from
other internal system as well as external data
providers
• Epam is responsible for development and
maintainance one of the key areas – Customer Value
Management flows.
• Epam developed flexible store level dashboards to
provide real-time insight into sales process based
on Tableaus
CLIENT turned to Epam in order to resolve the
performance and stability issues linked to the
growing volumes of data.
Also CLIENT has business requirements to
implement advanced analytics and advanced
techniques of sales.
BUSINESS PROBLEM IMPLEMENTATION
3CONFIDENTIAL
Before EPAM
4CONFIDENTIAL
Proposed state
5CONFIDENTIAL
EMR vs CLOUDERA CDH 5.x
6CONFIDENTIAL
CLOUDERA CDH 5.x
7CONFIDENTIAL
HUE with CDH-5
[desktop]
#database_logging=True
django_debug_mode=True
collect_usage=False
use_new_editor=True
use_new_side_panels=True
app_blacklist=spark,zookeeper,search,indexer,sqoop,pig,jobsub,rdbms
[[auth]]
backend=desktop.auth.backend.PamBackend,desktop.auth.backend.AllowFirstUserDjangoBack
end
pam_service=sudo sshd login
#idle_session_timeout=120
[[session]]
expire_at_browser_close=True
[hbase]
#hbase_conf_dir=/etc/hbase/conf
hbase_conf_dir={{HBASE_CONF_DIR}}
hbase_cluster=(Peach|ip-172-31-46-118.eu-west-1.compute.internal:9090)
[impala]
[[ssl]]
enabled=true
validate=false
[beeswax]
hive_server_host=ip-172-31-46-119.eu-west-1.compute.internal
[[ssl]]
enabled=false
validate=false
8CONFIDENTIAL
AWS CLOUD FORMATION
9CONFIDENTIAL
LET’S Encrypt implementation
# renew_before_expiry = 30 days
version = 0.13.0
archive_dir =
/etc/letsencrypt/archive/cdm.aptest.CLIENT.com
cert =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/cert.pem
privkey =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/privkey.pe
m
chain =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/chain.pe
m
fullchain =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/fullchain.
pem
# Options used in the renewal process
[renewalparams]
authenticator = standalone
installer = None
account = 69eadfb4d56ff298317fea965987659a
standalone_supported_challenges = http-01
10CONFIDENTIAL
Zabbix implementation
11CONFIDENTIAL
Chef server implementation
name 'users'
maintainer 'Chef Software, Inc.'
maintainer_email 'cookbooks@chef.io'
license 'Apache 2.0'
description 'Creates users from a databag search'
long_description IO.read(File.join(File.dirname(__FILE__), 'README.md'))
version '1.8.3'
recipe 'users::default', 'Empty recipe for including LWRPs'
recipe 'users::sysadmins', 'Create and manage sysadmin group'
%w( ubuntu debian redhat centos fedora freebsd mac_os_x scientific oracle
amazon ).each do |os|
supports os
end
source_url 'https://github.com/chef-cookbooks/users' if respond_to?(:source_url)
issues_url 'https://github.com/chef-cookbooks/users/issues' if
respond_to?(:issues_url)
12CONFIDENTIAL
Jenkins implementation
13CONFIDENTIAL
Tableau Implementation
14CONFIDENTIAL
Tableau Implementation
15CONFIDENTIAL
Kerberos Implementation
16CONFIDENTIAL
CLIENT
Server deployment diagram
Key features:
Internet
Client pc’s
Bastion

Hive2
Data
Node
Node
man.
NAT
Hue
Resource
Manager
Cloudera
Zabbix
Jenkins
Chef-
Server
Hue LB
Tableau LB
Tableau DC1
Tableau DC2
Tableau Prod
Tableau Worker1
Tableau Worker2
Tableau Backup
Name
Node
Oozie
Peach-Cluster-n1
…..
Peach-Cluster-n4
Test
nodes
Jupyter
HIveLB
R-Studio
Peach Hue LB
Peach HIve2 LB
Peach CHD5 LB
- Production and staging environments
- HA mode(Hadoop and Tableau)
- DMZ configuration
- Centralized configuration and
monitoring
- Dedicated analytics server DMZ
configuration( Jupyter and R studio)
- CM env
17CONFIDENTIAL
• Data integration projects of implementing
new platform into enterprise fabric based
on Enterprise Data Hub
• Developed Security Model for Big Data
solution (Kerberos)
• Implemented production and staging
environment
• Rapid ETL Development by using Python,
Pig, Hive, MapReduce over Hadoop
• >30 ETL jobs
• Bringing unstructured and semi-structured
data
• Integration with Enterprise Infrastructure
• Using Tableau for rich data visualization
CLIENT: BIG DATA EXPERIENCE
KEY TECHNICAL HIGHLIGHTS
18CONFIDENTIAL
EPAM BIG DATA COMPETENCY CENTER
Big Data Architecture Design, Implementation, and SupportBig Data CC value for clients
Deep expertise with cutting edge technologiesTop Facts
• Data Strategy, Data Governance Consulting
• Data Hub/Lake architecture
• Advanced Solutions Development, Predictive & Prescriptive
Analytics
• Infrastructure Implementation and Integration with Enterprise
Security
• Bi Data Solutions & Platform Support
• 300+ Engineers, Architects and Consultants
• 50+ Successfully delivered Big Data and HPC solutions
• 10+ years of BI product development history (for SAP, Oracle,
Pentaho)
• We understand business and how to make BI & BigData
technology work
• Our design is straightforward while following industry best
practices
• EPAM process is interactive, iterative and highly effective
• Quick development and timely implementation
• Proven delivery approach
19CONFIDENTIAL
EPAM OFFERINGTOP MESSAGES AND KEY FACTS
TECHNOLOGIES
EPAM Data Science includes Data Scientists and Senior Solution Architects
with MS/PhD Degrees in Applied Math, Physics, Computer Science, &
Predictive Analytics. Along with a strong mathematical background, the
group has vertical expertise in multiple industries and extensive practical
software development skills related to Big Data and conventional DW
technologies
• 100+ Data Scientists
• 20+ Data Modelers
• 10+ Data Strategists
• 50+ successfully delivered projects
TOP STORIES
DATA SCIENCE: DISCOVER AND PREDICT
Services
• Large scale information
solutions design
• Predictive model building and
validation
• Customer segmentation
analysis
• Data profiling and preparation
• Dimension reduction
techniques
Mathematical foundation
• Probability, statistics, and
stochastic processes
• Supervised and unsupervised
machine learning techniques
• Numerical methods and
implementations
• Optimization theory
Predictive models: linear and logistic regression, decision trees, clustering, naïve bayes, support vector machines,
neural networks, kernel estimation, panel data analysis, survival/duration analysis, and time series analysis

More Related Content

What's hot

Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
DataWorks Summit
 
Modernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data InfrastructureModernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data Infrastructure
NetApp
 
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and WhyEssbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Datavail
 
ASE Tempdb Performance and Tuning
ASE Tempdb Performance and Tuning ASE Tempdb Performance and Tuning
ASE Tempdb Performance and Tuning
SAP Technology
 
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOneFlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
NetApp
 
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and WhyEssbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Datavail
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
Ontico
 
Oracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCSOracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCS
NetApp
 
Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"
EDB
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
Saptak Sen
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
EDB
 
Replacing Oracle with EDB Postgres
Replacing Oracle with EDB PostgresReplacing Oracle with EDB Postgres
Replacing Oracle with EDB Postgres
EDB
 
Best Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture SetupBest Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture Setup
EDB
 
Why IBM Power for SAP by John Hedge
Why IBM Power for SAP by John HedgeWhy IBM Power for SAP by John Hedge
Why IBM Power for SAP by John Hedge
John R Hedge
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
Chris Kernaghan
 
An Expert Guide to Migrating Legacy Databases to PostgreSQL
An Expert Guide to Migrating Legacy Databases to PostgreSQLAn Expert Guide to Migrating Legacy Databases to PostgreSQL
An Expert Guide to Migrating Legacy Databases to PostgreSQL
EDB
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
Joseph Holbrook, Chief Learning Officer (CLO)
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud Database
EDB
 

What's hot (20)

Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
 
Modernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data InfrastructureModernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data Infrastructure
 
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and WhyEssbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
 
ASE Tempdb Performance and Tuning
ASE Tempdb Performance and Tuning ASE Tempdb Performance and Tuning
ASE Tempdb Performance and Tuning
 
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOneFlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
 
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and WhyEssbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
 
Oracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCSOracle Database Consolidation with FlexPod on Cisco UCS
Oracle Database Consolidation with FlexPod on Cisco UCS
 
Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"Postgres Integrates Effectively in the "Enterprise Sandbox"
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
 
Replacing Oracle with EDB Postgres
Replacing Oracle with EDB PostgresReplacing Oracle with EDB Postgres
Replacing Oracle with EDB Postgres
 
Best Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture SetupBest Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
Why IBM Power for SAP by John Hedge
Why IBM Power for SAP by John HedgeWhy IBM Power for SAP by John Hedge
Why IBM Power for SAP by John Hedge
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
 
An Expert Guide to Migrating Legacy Databases to PostgreSQL
An Expert Guide to Migrating Legacy Databases to PostgreSQLAn Expert Guide to Migrating Legacy Databases to PostgreSQL
An Expert Guide to Migrating Legacy Databases to PostgreSQL
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud Database
 

Similar to ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурное решение для Hadoop Big data и AWS с Cloudera CDH 5.x"

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM Cognos
Cresco International
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Precisely
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
ModusOptimum
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
Robin David
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
Prem Jain
 
Tame that Beast
Tame that BeastTame that Beast
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
jdijcks
 
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
Doina Draganescu
 
Naman_Abinitio_7757021406
Naman_Abinitio_7757021406Naman_Abinitio_7757021406
Naman_Abinitio_7757021406
Naman Gupta
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA
 
PradeepDWH
PradeepDWHPradeepDWH
PradeepDWH
Pradeep Pandey
 

Similar to ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурное решение для Hadoop Big data и AWS с Cloudera CDH 5.x" (20)

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM Cognos
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...6. real time integration with odi 11g & golden gate 11g & dq 11g   20101103 -...
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
 
Naman_Abinitio_7757021406
Naman_Abinitio_7757021406Naman_Abinitio_7757021406
Naman_Abinitio_7757021406
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
PradeepDWH
PradeepDWHPradeepDWH
PradeepDWH
 

More from epamspb

Mobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast diveMobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast dive
epamspb
 
Mobile Open Day: Things I wish I'd known about Core Data before getting married
Mobile Open Day: Things I wish I'd known about Core Data before getting marriedMobile Open Day: Things I wish I'd known about Core Data before getting married
Mobile Open Day: Things I wish I'd known about Core Data before getting married
epamspb
 
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
epamspb
 
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
epamspb
 
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
epamspb
 
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
epamspb
 
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
epamspb
 
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
epamspb
 
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
epamspb
 
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
epamspb
 
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
epamspb
 
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
epamspb
 
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
epamspb
 

More from epamspb (13)

Mobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast diveMobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast dive
 
Mobile Open Day: Things I wish I'd known about Core Data before getting married
Mobile Open Day: Things I wish I'd known about Core Data before getting marriedMobile Open Day: Things I wish I'd known about Core Data before getting married
Mobile Open Day: Things I wish I'd known about Core Data before getting married
 
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
 
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
 
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
 
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
 
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
 
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
 
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
 
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
 
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
 
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
 
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
 

Recently uploaded

A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
Anand Bagmar
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
Refactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contextsRefactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contexts
Michał Kurzeja
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
OnePlan Solutions
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
mohitd6
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
kalichargn70th171
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
Ortus Solutions, Corp
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
KrishnaveniMohan1
 

Recently uploaded (20)

A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
Refactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contextsRefactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contexts
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
 

ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурное решение для Hadoop Big data и AWS с Cloudera CDH 5.x"

  • 1. 1CONFIDENTIAL Real complex infrastructure solution for Hadoop Big data and AWS with Cloudera CDH 5.x May, 2017
  • 2. 2CONFIDENTIAL CLIENT • Epam is responsible for system engineering of enterprise data lake. based on the Hadoop technology stack. • Epam is responsible for ETL implementation from other internal system as well as external data providers • Epam is responsible for development and maintainance one of the key areas – Customer Value Management flows. • Epam developed flexible store level dashboards to provide real-time insight into sales process based on Tableaus CLIENT turned to Epam in order to resolve the performance and stability issues linked to the growing volumes of data. Also CLIENT has business requirements to implement advanced analytics and advanced techniques of sales. BUSINESS PROBLEM IMPLEMENTATION
  • 7. 7CONFIDENTIAL HUE with CDH-5 [desktop] #database_logging=True django_debug_mode=True collect_usage=False use_new_editor=True use_new_side_panels=True app_blacklist=spark,zookeeper,search,indexer,sqoop,pig,jobsub,rdbms [[auth]] backend=desktop.auth.backend.PamBackend,desktop.auth.backend.AllowFirstUserDjangoBack end pam_service=sudo sshd login #idle_session_timeout=120 [[session]] expire_at_browser_close=True [hbase] #hbase_conf_dir=/etc/hbase/conf hbase_conf_dir={{HBASE_CONF_DIR}} hbase_cluster=(Peach|ip-172-31-46-118.eu-west-1.compute.internal:9090) [impala] [[ssl]] enabled=true validate=false [beeswax] hive_server_host=ip-172-31-46-119.eu-west-1.compute.internal [[ssl]] enabled=false validate=false
  • 9. 9CONFIDENTIAL LET’S Encrypt implementation # renew_before_expiry = 30 days version = 0.13.0 archive_dir = /etc/letsencrypt/archive/cdm.aptest.CLIENT.com cert = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/cert.pem privkey = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/privkey.pe m chain = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/chain.pe m fullchain = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/fullchain. pem # Options used in the renewal process [renewalparams] authenticator = standalone installer = None account = 69eadfb4d56ff298317fea965987659a standalone_supported_challenges = http-01
  • 11. 11CONFIDENTIAL Chef server implementation name 'users' maintainer 'Chef Software, Inc.' maintainer_email 'cookbooks@chef.io' license 'Apache 2.0' description 'Creates users from a databag search' long_description IO.read(File.join(File.dirname(__FILE__), 'README.md')) version '1.8.3' recipe 'users::default', 'Empty recipe for including LWRPs' recipe 'users::sysadmins', 'Create and manage sysadmin group' %w( ubuntu debian redhat centos fedora freebsd mac_os_x scientific oracle amazon ).each do |os| supports os end source_url 'https://github.com/chef-cookbooks/users' if respond_to?(:source_url) issues_url 'https://github.com/chef-cookbooks/users/issues' if respond_to?(:issues_url)
  • 16. 16CONFIDENTIAL CLIENT Server deployment diagram Key features: Internet Client pc’s Bastion Hive2 Data Node Node man. NAT Hue Resource Manager Cloudera Zabbix Jenkins Chef- Server Hue LB Tableau LB Tableau DC1 Tableau DC2 Tableau Prod Tableau Worker1 Tableau Worker2 Tableau Backup Name Node Oozie Peach-Cluster-n1 ….. Peach-Cluster-n4 Test nodes Jupyter HIveLB R-Studio Peach Hue LB Peach HIve2 LB Peach CHD5 LB - Production and staging environments - HA mode(Hadoop and Tableau) - DMZ configuration - Centralized configuration and monitoring - Dedicated analytics server DMZ configuration( Jupyter and R studio) - CM env
  • 17. 17CONFIDENTIAL • Data integration projects of implementing new platform into enterprise fabric based on Enterprise Data Hub • Developed Security Model for Big Data solution (Kerberos) • Implemented production and staging environment • Rapid ETL Development by using Python, Pig, Hive, MapReduce over Hadoop • >30 ETL jobs • Bringing unstructured and semi-structured data • Integration with Enterprise Infrastructure • Using Tableau for rich data visualization CLIENT: BIG DATA EXPERIENCE KEY TECHNICAL HIGHLIGHTS
  • 18. 18CONFIDENTIAL EPAM BIG DATA COMPETENCY CENTER Big Data Architecture Design, Implementation, and SupportBig Data CC value for clients Deep expertise with cutting edge technologiesTop Facts • Data Strategy, Data Governance Consulting • Data Hub/Lake architecture • Advanced Solutions Development, Predictive & Prescriptive Analytics • Infrastructure Implementation and Integration with Enterprise Security • Bi Data Solutions & Platform Support • 300+ Engineers, Architects and Consultants • 50+ Successfully delivered Big Data and HPC solutions • 10+ years of BI product development history (for SAP, Oracle, Pentaho) • We understand business and how to make BI & BigData technology work • Our design is straightforward while following industry best practices • EPAM process is interactive, iterative and highly effective • Quick development and timely implementation • Proven delivery approach
  • 19. 19CONFIDENTIAL EPAM OFFERINGTOP MESSAGES AND KEY FACTS TECHNOLOGIES EPAM Data Science includes Data Scientists and Senior Solution Architects with MS/PhD Degrees in Applied Math, Physics, Computer Science, & Predictive Analytics. Along with a strong mathematical background, the group has vertical expertise in multiple industries and extensive practical software development skills related to Big Data and conventional DW technologies • 100+ Data Scientists • 20+ Data Modelers • 10+ Data Strategists • 50+ successfully delivered projects TOP STORIES DATA SCIENCE: DISCOVER AND PREDICT Services • Large scale information solutions design • Predictive model building and validation • Customer segmentation analysis • Data profiling and preparation • Dimension reduction techniques Mathematical foundation • Probability, statistics, and stochastic processes • Supervised and unsupervised machine learning techniques • Numerical methods and implementations • Optimization theory Predictive models: linear and logistic regression, decision trees, clustering, naïve bayes, support vector machines, neural networks, kernel estimation, panel data analysis, survival/duration analysis, and time series analysis

Editor's Notes

  1. Готовое комплексное инфраструктурное решение для Hadoop Big Data на Amazon AWS на базе CDH 5.x Пользоваться выхлопом данного решения будут бизнесс аналитики на примере Для компании по розниченой продаже брэндовой одежды и парфюмерии На основании экономия ресурсов Безоасность Удобство использования и поддержки
  2. Школа ремонта С чем пришел клиент Что все плохо и падает и вылезли из задницы И клиент доволен
  3. А теперь почему это получилось потому что у нас есть епам компетенси центр
  4. Мы понимаем бизнес и то, как работает технология BI & BigData Наш дизайн прост и соответствует лучшим отраслевым практикам Процесс EPAM является интерактивным, итеративным и высокоэффективным Быстрая разработка и своевременное внедрение Подтвержденный метод доставки Стратегия данных, Консалтинг по управлению данными Архитектура центра данных / озера Разработка передовых решений, предварительная и предписывающая аналитика Внедрение инфраструктуры и интеграция с корпоративной безопасностью Bi Data Solutions и поддержка платформ
  5. EPAM Data Science включает ученых-данных и старших архитекторов решений со степенями MS / PhD по прикладной математике, физике, информатике и аналитической аналитике. Наряду с сильной математической базой группа обладает опытом работы в разных отраслях и имеет обширные практические навыки разработки программного обеспечения, связанные с большими данными и традиционными технологиями DW Разработка крупномасштабных информационных решений Прогнозирование построения моделей и их валидация Анализ сегментации клиентов Профилирование и подготовка данных Методы уменьшения размеров Кто хочет хочет поучаствовать ….. Коллеги спасибо вам за внимание и я буду рад если кому-то это понадобится и использует в своей работе и карьере