The Web is until now mostly considered to be a Web of documents, more specifically a Web of HTML pages. However, the inventor of the Web Tim Berners Lee considers the Web not to have reached its fullest potential. The Data Web and Linked Data will enable more precise search services transforming the Web into a smarter and richer Web. Google for example uses Linked Data concepts to realize its own knowledge graph to process voice commands and voice queries for users. Linked Data concepts are not limited to the public Web. They can also be used to capture private knowledge in private company Webs making them potentially applicable as the backbone for future PLM solutions.
The Web is until now mostly considered to be a Web of documents, more specifically a Web of HTML pages. However, the inventor of the Web Tim Berners Lee considers the Web not to have reached its fullest potential. The Data Web and Linked Data will enable more precise search services transforming the Web into a smarter and richer Web. Google for example uses Linked Data concepts to realize its own knowledge graph to process voice commands and voice queries for users. Linked Data concepts are not limited to the public Web. They can also be used to capture private knowledge in private company Webs making them potentially applicable as the backbone for future PLM solutions.
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...HostedbyConfluent
"Applying some measure of governance over how schemas are managed helps ensure good quality data, as well as better lineage tracking and governance.
At Saxo, we have been on a journey to take control of how we manage our data through the use of rich, governed schemas. We hit a challenge when we wanted to ingest data with Kafka Connect, as there was no way to ensure the data coming through was matched with these existing schemas. We were left having to either build a second step of manual transformations for simply matching generic data into our internal schemas, or play a lengthy game of cat and mouse with Connect exceptions and complex per-field transformations.
During this talk, we will be presenting how we tackled this issue by developing our own Schema Matching transformation. Our SMT can automatically match fields into a referenced schema. We will go through our experience designing the solution, and some of the key findings developing the SMT for both Avro and Protobuf."
Pentaho Data Integration. Preparing and blending data from any source for analytics. Thus, enabling data-driven decision making. Application for education, specially, academic and learning analytics.
Data virtualization, Data Federation & IaaS with Jboss TeiidAnil Allewar
Enterprise have always grappled with the problem of information silos that needed to be merged using multiple data warehouses(DWs) and business intelligence(BI) tools so that enterprises could mine this disparate data for businessdecisions and strategy. Traditionally this data integration was done with ETL by consolidating multiple DBMS into a single data storage facility.
Data virtualization enables abstraction, transformation, federation, and delivery of data taken from variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration. It allows consuming applications or users to access data from these various sources via a request to a single access point and delivers information-as-a-service (IaaS).
In this presentation, we will explore what data virtualization is and how it differs from the traditional data integration architecture. We’ll also look at validating the data virtualization and federation concepts by working through an example(see videos at the GitHub repo) to federate data across 2 heterogeneous data sources; mySQL and MongoDB using the JBoss Teiid data virtualization platform.
Many companies today move mountains of data using ETL (extract, transform, load) technology. But data volumes are growing too large to move, customers are now expecting real-time data, and ETL costs now account for 10-15% of computing capacity. In this slide presentation, you can see how data virtualization enables data structures that were designed independently to be leveraged together, in real time, and without data movement, reducing complexity, lowering IT costs, and minimizing risk.
Oracle Data Integrator (ODI) Online Training is providing at Glory IT Technologies. You will learn how to create the ODI topology, design ODI interfaces, packages, procedures and organize ODI models & other objects. Every student will learn how to use manage projects in ODI to develop interfaces and objects. Our ODI Training takes student through some of the more advanced features is used of Oracle Data Integrator.
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...Alex Rayón Jerez
Sesión de Pentaho Data Integration impartida en Noviembre de 2015 en el marco del Programa de Big Data y Business Intelligence de la Universidad de Deusto (detalle aquí http://bit.ly/1PhIVgJ).
The Big Data Analytics Ecosystem at LinkedInrajappaiyer
LinkedIn has several data driven products that improve the experience of its users -- whether they are professionals or enterprises. Supporting this is a large ecosystem of systems and processes that provide data and insights in a timely manner to the products that are driven by it.
This talk provides an overview of the various components of this ecosystem which are:
- Hadoop
- Teradata
- Kafka
- Databus
- Camus
- Lumos
etc.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...HostedbyConfluent
"Applying some measure of governance over how schemas are managed helps ensure good quality data, as well as better lineage tracking and governance.
At Saxo, we have been on a journey to take control of how we manage our data through the use of rich, governed schemas. We hit a challenge when we wanted to ingest data with Kafka Connect, as there was no way to ensure the data coming through was matched with these existing schemas. We were left having to either build a second step of manual transformations for simply matching generic data into our internal schemas, or play a lengthy game of cat and mouse with Connect exceptions and complex per-field transformations.
During this talk, we will be presenting how we tackled this issue by developing our own Schema Matching transformation. Our SMT can automatically match fields into a referenced schema. We will go through our experience designing the solution, and some of the key findings developing the SMT for both Avro and Protobuf."
Pentaho Data Integration. Preparing and blending data from any source for analytics. Thus, enabling data-driven decision making. Application for education, specially, academic and learning analytics.
Data virtualization, Data Federation & IaaS with Jboss TeiidAnil Allewar
Enterprise have always grappled with the problem of information silos that needed to be merged using multiple data warehouses(DWs) and business intelligence(BI) tools so that enterprises could mine this disparate data for businessdecisions and strategy. Traditionally this data integration was done with ETL by consolidating multiple DBMS into a single data storage facility.
Data virtualization enables abstraction, transformation, federation, and delivery of data taken from variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration. It allows consuming applications or users to access data from these various sources via a request to a single access point and delivers information-as-a-service (IaaS).
In this presentation, we will explore what data virtualization is and how it differs from the traditional data integration architecture. We’ll also look at validating the data virtualization and federation concepts by working through an example(see videos at the GitHub repo) to federate data across 2 heterogeneous data sources; mySQL and MongoDB using the JBoss Teiid data virtualization platform.
Many companies today move mountains of data using ETL (extract, transform, load) technology. But data volumes are growing too large to move, customers are now expecting real-time data, and ETL costs now account for 10-15% of computing capacity. In this slide presentation, you can see how data virtualization enables data structures that were designed independently to be leveraged together, in real time, and without data movement, reducing complexity, lowering IT costs, and minimizing risk.
Oracle Data Integrator (ODI) Online Training is providing at Glory IT Technologies. You will learn how to create the ODI topology, design ODI interfaces, packages, procedures and organize ODI models & other objects. Every student will learn how to use manage projects in ODI to develop interfaces and objects. Our ODI Training takes student through some of the more advanced features is used of Oracle Data Integrator.
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...Alex Rayón Jerez
Sesión de Pentaho Data Integration impartida en Noviembre de 2015 en el marco del Programa de Big Data y Business Intelligence de la Universidad de Deusto (detalle aquí http://bit.ly/1PhIVgJ).
The Big Data Analytics Ecosystem at LinkedInrajappaiyer
LinkedIn has several data driven products that improve the experience of its users -- whether they are professionals or enterprises. Supporting this is a large ecosystem of systems and processes that provide data and insights in a timely manner to the products that are driven by it.
This talk provides an overview of the various components of this ecosystem which are:
- Hadoop
- Teradata
- Kafka
- Databus
- Camus
- Lumos
etc.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
Entrega contínua com github e windows azureLuis Rudge
Essa não é uma palestra para discutir se a entrega contínua é legal ou não. Se você também acredita que o que está no branch principal (main) pode ser publicado automaticamente, entre para o grupo e aprenda como fazer isso facilmente utilizando o github e Windows Azure.
1. Gopi
singamsettigopi21@gmail.com
+ 91-8970777649
Experience Summary
• Overall 3+ Years of experience in implementation of Data Warehousing projects with Teradata.
• Working at Tata consultancy services ltd (Bangalore), as Software Engineer from July 2013 to
till date.
Qualification
• B.Tech (2012) in Electrical and Electronics Engineering from Prakasam Engineering College.
Professional Skills
• Good experience with Teradata Utilities like BTEQ, Fast Load, Multi Load, Tpump and Fast
Export.
• Extensive experience loading of data into Teradata from flat files using FASTLOAD scripts.
• Worked extensively on Teradata SQL assistant.
• Good Understanding in Data Warehousing Concepts.
• Did error handling and performance tuning of Teradata queries.
• Did Data reconciliation in various source systems and in Teradata.
• Worked with Explain and Collect Statistics.
• Worked with ET and UV Tables.
• Addressed AD-HOC Requests of the Clients.
• Used Sub Queries, Joins, Set Operations, Functions and advanced OLAP functions extensively.
• Prepared unit test specification requirements.
Core Competencies
Databases : Teradata 12/13/14, ORACLE
Languages : SQL, C
Operating systems : UNIX, Windows XP, 2003
Tools : Teradata SQL Assistant
Scheduling Tool : WLM (Work Load Manager)
2. Ticketing Tool : Tivoli
Projects Handled/Recent Accomplishments:
Project #1:
Project : Black hawk Network Financial Data Reporting System
Client : Black hawk Network Financial Services
Role : Teradata Developer
Team Size : 8
Environment : Teradata utilities (Fast Load, MLOAD, BTEQ) and TD SQL Assistant 13, UNIX, Tivoli
Duration : Mar 2015 – till date
Project Description:
Black hawk Network financial Data Reporting System to provide different types of commercial loans to
the prospective customers in a short span of time. The objective is to achieve single point of reference
to get company, Contact, Process, and deal data from various databases. Distributed data residing in
heterogeneous data sources is consolidated into the target Teradata database. The project involved
creating the data warehouse, analyzing the source data, and then deciding on the appropriate
extraction, transformation and loading strategy. Subsequently, the data flow from the data source to
the target tables were created along with the utilities.
3. Roles and Responsibilities:
Analysis of the specifications provided by the clients.
Created scripts for Teradata utilities like Fast load, MLoad, Fast Export and BTEQ.
Wrote hundreds of DDL scripts to create tables, views and indexes in the company “Data
Warehouse".
Reduced Teradata space used by optimizing tables – adding compression where appropriate
and ensuring optimum column definitions.
Involved in loading of data into Teradata from legacy systems and flat files using complex
MLOAD scripts and FASTLOAD scripts.
Analyzing the Tables and Indexes selections for Data and Access, Primary and Secondary
indexes for Tables.
Modifying the queries to use the Teradata features for performance improvement.
Worked with Explain and Collect Statistics.
Project# 2:
Project : Customer Enterprise Data Warehouse
Client : Verizon, UK
Role : Teradata Developer
Team Size : 12
Environment : Teradata utilities (Fast Load, MLOAD, BTEQ) and TD SQL Assistant 13, UNIX, WLM, Tivoli
Duration : Aug 2013 – Feb 2015
Project Description:
This Project is to Design and Construct Billing Mart for Customer Enterprise Data Warehouse. The
objective is to achieve single point of reference to get the customer data from the various databases.
Distributed data residing in heterogeneous data sources is consolidated into the target Teradata
database. The project involved creating the data warehouse, analyzing the source data, and then
deciding on the appropriate extraction, transformation and loading strategy. Subsequently, the data
flow from the data source to the target tables were created along with the required utilities.
Roles and Responsibilities:
Working with different sources like flat files and XML files.
Extensively worked in Data Extraction, Transformation and Loading from source to
target system using BTEQ, Fast Load and Multi Load.
Worked exclusively with the Teradata SQL Assistant.
Creating BTEQ scripts from Staging to Target.
4. Query optimizations (explain plans, collect statistics).
Working with ET, UV and WT tables in error handling worked exclusively with the
Teradata SQLA.
(Gopi S)