This document discusses how Informatica's Big Data Edition and Vibe Data Stream products can be used for offloading data warehousing to Hadoop. It provides an overview of each product and how they help with challenges of developing and maintaining Hadoop-based data warehouses by improving developer productivity, making skills easier to acquire, and lowering risks. It also includes a demo of how the products integrate various data sources and platforms.
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
This document discusses how Perficient, an IT consulting firm, can help clients integrate big data into their organizations at lower total costs. It provides an overview of Perficient's services and solutions expertise in areas like business intelligence, customer experience, enterprise resource planning, and mobile platforms. The document also profiles Perficient with details on its history, locations, colleagues, and partnership model. Finally, it outlines an agenda for an event on balancing innovation and costs with big data, including discussions on PowerCenter Big Data Edition and what customers are doing with Informatica and big data.
This document provides an overview and agenda for a presentation on Informatica Big Data and Social Media Integration. It discusses Informatica and its platform, integration with Hadoop and big data, social media integration, and Informatica's telecom network streaming integration solution. The document includes information on Informatica's history and partnerships, how its platform addresses traditional ETL challenges, its features for working with Hadoop and big data, and how it can be used to integrate social media data and perform customer sentiment analysis.
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
This presentation provides a briefing on Big Data and Hadoop and how Informatica's Big Data Integration plays a role to empower the data-centric enterprise.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
This document discusses how Hadoop can be used in data warehousing and analytics. It begins with an overview of data warehousing and analytical databases. It then describes how organizations traditionally separate transactional and analytical systems and use extract, transform, load processes to move data between them. The document proposes using Hadoop as an alternative to traditional data warehousing architectures by using it for extraction, transformation, loading, and even serving analytical queries.
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
The enormous legacy of EDW experience and best practices can be adapted to the unique capabilities of the Hadoop environment. In this webinar, in a point-counterpoint format, Dr. Kimball will describe standard data warehouse best practices including the identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions. Eli Collins, Chief Technologist at Cloudera, will describe how each of these practices actually can be implemented in Hadoop.
Join Cloudera’s founder and Chief Scientist, Jeff Hammerbacher, as he describes ten common problems that are being solved with Apache Hadoop.
A replay of the webinar can be viewed here:
https://www1.gotomeeting.com/register/719074008
This document provides an introduction to Apache Kudu, a storage layer for Apache Hadoop designed for fast analytics on fast data. It discusses Kudu's motivations of filling gaps in HDFS and HBase capabilities, its design goals of high throughput scans and low latency reads/writes, and how its columnar storage and integration with tools like Spark and Impala enable it to meet these goals. Example use cases like time series and real-time analytics are presented. The document also covers Kudu's architecture of tables and tablets, its replication and fault tolerance model using Raft consensus, and performance comparisons that show it outperforming other storage systems.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
OpenSearch는 배포형 오픈 소스 검색과 분석 제품군으로 실시간 애플리케이션 모니터링, 로그 분석 및 웹 사이트 검색과 같이 다양한 사용 사례에 사용됩니다. OpenSearch는 데이터 탐색을 쉽게 도와주는 통합 시각화 도구 OpenSearch와 함께 뛰어난 확장성을 지닌 시스템을 제공하여 대량 데이터 볼륨에 빠르게 액세스 및 응답합니다. 이 세션에서는 실제 동작 구조에 대한 설명을 바탕으로 최적화를 하기 위한 방법과 운영상에 발생할 수 있는 이슈에 대해서 알아봅니다.
This document discusses how Informatica's Big Data Edition and Vibe Data Stream products can be used for offloading data warehousing to Hadoop. It provides an overview of each product and how they help with challenges of developing and maintaining Hadoop-based data warehouses by improving developer productivity, making skills easier to acquire, and lowering risks. It also includes a demo of how the products integrate various data sources and platforms.
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
This document discusses how Perficient, an IT consulting firm, can help clients integrate big data into their organizations at lower total costs. It provides an overview of Perficient's services and solutions expertise in areas like business intelligence, customer experience, enterprise resource planning, and mobile platforms. The document also profiles Perficient with details on its history, locations, colleagues, and partnership model. Finally, it outlines an agenda for an event on balancing innovation and costs with big data, including discussions on PowerCenter Big Data Edition and what customers are doing with Informatica and big data.
This document provides an overview and agenda for a presentation on Informatica Big Data and Social Media Integration. It discusses Informatica and its platform, integration with Hadoop and big data, social media integration, and Informatica's telecom network streaming integration solution. The document includes information on Informatica's history and partnerships, how its platform addresses traditional ETL challenges, its features for working with Hadoop and big data, and how it can be used to integrate social media data and perform customer sentiment analysis.
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
This presentation provides a briefing on Big Data and Hadoop and how Informatica's Big Data Integration plays a role to empower the data-centric enterprise.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
This document discusses how Hadoop can be used in data warehousing and analytics. It begins with an overview of data warehousing and analytical databases. It then describes how organizations traditionally separate transactional and analytical systems and use extract, transform, load processes to move data between them. The document proposes using Hadoop as an alternative to traditional data warehousing architectures by using it for extraction, transformation, loading, and even serving analytical queries.
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
The enormous legacy of EDW experience and best practices can be adapted to the unique capabilities of the Hadoop environment. In this webinar, in a point-counterpoint format, Dr. Kimball will describe standard data warehouse best practices including the identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions. Eli Collins, Chief Technologist at Cloudera, will describe how each of these practices actually can be implemented in Hadoop.
Join Cloudera’s founder and Chief Scientist, Jeff Hammerbacher, as he describes ten common problems that are being solved with Apache Hadoop.
A replay of the webinar can be viewed here:
https://www1.gotomeeting.com/register/719074008
This document provides an introduction to Apache Kudu, a storage layer for Apache Hadoop designed for fast analytics on fast data. It discusses Kudu's motivations of filling gaps in HDFS and HBase capabilities, its design goals of high throughput scans and low latency reads/writes, and how its columnar storage and integration with tools like Spark and Impala enable it to meet these goals. Example use cases like time series and real-time analytics are presented. The document also covers Kudu's architecture of tables and tablets, its replication and fault tolerance model using Raft consensus, and performance comparisons that show it outperforming other storage systems.
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
A Fortune 100 company recently introduced Hadoop into their data warehouse environment and ETL workflow to save $30 Million. This session examines the specific use case to illustrate the design considerations, as well as the economics behind ETL offload with Hadoop. Additional information about how the Hadoop platform was leveraged to support extended analytics will also be referenced.
OpenSearch는 배포형 오픈 소스 검색과 분석 제품군으로 실시간 애플리케이션 모니터링, 로그 분석 및 웹 사이트 검색과 같이 다양한 사용 사례에 사용됩니다. OpenSearch는 데이터 탐색을 쉽게 도와주는 통합 시각화 도구 OpenSearch와 함께 뛰어난 확장성을 지닌 시스템을 제공하여 대량 데이터 볼륨에 빠르게 액세스 및 응답합니다. 이 세션에서는 실제 동작 구조에 대한 설명을 바탕으로 최적화를 하기 위한 방법과 운영상에 발생할 수 있는 이슈에 대해서 알아봅니다.
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [중고책나라] : 실시간 데이터를 이용한 Elasticsearch 클러스터 최적화BOAZ Bigdata
데이터 엔지니어링 프로젝트를 진행한 중고책나라 팀에서는 아래와 같은 프로젝트를 진행했습니다.
중고책 실시간 데이터를 활용하여 Elasticsearch Indexing 클러스터 성능 최적화
18기 금나연 숙명여자대학교 IT공학 전공
18기 박규연 국민대학교 소프트웨어학부
18기 김건우 국민대학교 AI빅데이터융합경영학과
This document discusses an open source architecture called giip that provides alternatives to AWS services like Lambda and API Gateway. It allows engineers to choose components from open source or other cloud providers to build applications in a mixed mode. The document provides an overview of the giip architecture and information on how to contact the organization behind it.
http://giipweb.littleworld.net
giip is very stable automation engine since 2007.
if you have any servers - cloud or legacy- and IoT devices as camera or speaker, drone, robots, etc, you can control by giip all in one!
giip engine can be replace any open source as slide 2.
If you have any knowledge, you may change giip engine.
If you want join or make partnership, contact us. :)
contact@littleworld.net
모든 것을 연결하여 최적의 인프라 기저를 형성하고 그 위에 빅데이터 저장 및 분석의 기반을 만들어 인공지능을 얹은 글로벌 시장의 리더와 붙을 수 있는 플랫폼을 소개합니다.
멀티벤더 지향의 기본 개념위에 메이저 금융S사의 지원으로 이미 많은 시스템 및 메이저 기업의 IoT데이터를 수집하여 빅데이터 분석을 진행중이며, S사의 로봇 기술 지원 및 향후 Drone의 제어, 데이터의 머신러닝등을 통하여 이미 글로벌 시장을 향하여 나아가고 있습니다.
많은 분들의 관심과 협력 부탁 드립니다.
문의 및 파트너쉽 : contact@littleworld.net
Global Infrastructure Information Service Brochure.
This is powerful ITAM and System Automation service.
It works Linux and Windows every cloud and legacy machine.
You can use many type of script.
- wsf(Windows Script Host File)
- vbs(Visual Basic Script)
- bat(Windows Batch file)
- sh(Linux Shell/Bash)
- py(Python)
- rb(Ruby, CHEF)
- go(Go, Docker)
2. Hadoop 기본 구성
Name Node(Pri)
6Gbps SATA x 2 RAID 1
Name Node(Sec)
6Gbps SATA x 2 RAID 1
Job Tracker
6Gbps SATA x 2 RAID 1
DataNode01
6Gbps SATA x 8 RAID 10
DataNode02
6Gbps SATA x 8 RAID 10
DataNode03
6Gbps SATA x 8 RAID 10
DataNode04
6Gbps SATA x 8 RAID 10
10Gbps
10Gbps
10Gbps
10Gbps
10Gbps
10Gbps
10Gbps
10Gbps
● Network는 기본 10Gbps사용
● Storage Network 전용 구간 생성
● Name Node 및 Job Tracker는 RAID 1으로
가성비 위주 구성
● DataNode의 구성 예
○ RAID 0 + 3벌 복제
○ RAID 10 + 2벌 복제
○ RAID 5 + 3벌 복제
○ RAID 6 + 2벌 복제
3. Sizing 고려사항
Starting Capacity SLA Capacity Planning
Log Data Recycle Data Data Scan Frequency
Data Safty IO Performance
RAID Replica Set Data Node Quantity
4. 구성별 성능표(이론치)
● Data Node Spec : 6Gbps SATA Disk x 8ea
● Data Node Quantity : 8ea
● Not included network environment
● Not included RAID controller environment
● Single Disk No RAID = x 1
RAID 0,
3 Replica
RAID 10,
2 Replica
RAID 5,
3 Replica
RAID 6,
2 Replica
Read x 21.3 x 16 x 18.6 x 24
Write x 21.3 x 16 x 2.6 x 4
Capacity x 21.3 x 16 x 18.6 x 24
Fault Threshold 1 Disk x
3 Servers
3 Disks x
2 Servers
2 Disks x
3 Servers
3 Disks x
2 Servers
5. 사용패턴 분석에 따른 가중치
읽기 비율 쓰기 비율 IO성능(MB/sec)
75 25 1805
50 50 1518
25 75 1231
● 기준값
○ Write : 944 MB/sec
○ Read : 2092 MB/sec
● 10Gbps 네트워크 환경에서 2 Replica Set, 4 Data Node에서 사용한 일반적인 속도
(환경마다 다를 수 있음)
● 필요 Throughput에 따른 Data Node 개수 확장 계산에 사용
○ 초당 성능이 3GB/sec인 경우 Data Node를 4개에서 8개로 확장하면 약 3.6GB/sec로 조건
에 만족할 수 있음.
(실측치와는 다를 수 있음)