Greg Chase, Director, Product Marketing presents Big Data 10 A
mazing Things to do With A Hadoop-based Data Lake at the Strata Conference + Hadoop World 2014 in NYC.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
Finance Data Lake objective is to create a centralized enterprise data repository for all Finance and Supply Chain data. It serves as the single source of truth. It enables a self-service discovery Analytics platform for business users to answer adhoc business questions and derive critical insights. The data lake is based on open source Hadoop big data platform and a very cost effective solution in breaking the ERP data silos and simplifying the data architecture in the enterprise.
POCs were conducted on in-house Hortonworks Hadoop data platform to validate the cluster performance for Production volumes. Based on business priorities, an initial roadmap was defined using 3 data sources including 2 SAP ERPs and Peoplesoft (OLTP systems). Development environment was established in AWS Cloud for agile delivery. The near real time data ingestion architecture for the data lake was defined using replication tools and custom SQOOP based micro-batching framework and data persisted in Apache Hive DB in ORC format. Data and user security is implemented using Apache Ranger and sensitive data stored at rest in encryption zones. Business data sets were developed in Hive scripts and scheduled using Oozie. Multiple reporting tools connectivity including SQL tools, Excel and Tableau were enabled for Self-service Analytics. Upon successful implementation of the initial phase, a full roadmap is established to extend the Finance data lake to over 25 data sources and enhance data ingestion to scale as well as enable OLAP tools on Hadoop.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
As the enterprise's big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat's proven platform technologies. Join us in this interactive 3-part webinar series, as we'll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Hortonworks and Clarity Solution Group Hortonworks
Many organizations are leveraging social media to understand consumer sentiment and opinions about brands and products. Analytics in this area, however, is in its infancy and does not always provide a compelling result for effective business impact. Learn how consumer organizations can benefit by integrating social data with enterprise data to drive more profitable consumer relationships. This webinar is presented by Hortonworks and Clarity Solution Group, and will focus on the evolution of Hadoop, the clear advantage of Hortonworks distribution, and business challenges solved by “Consumer720.”
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
Finance Data Lake objective is to create a centralized enterprise data repository for all Finance and Supply Chain data. It serves as the single source of truth. It enables a self-service discovery Analytics platform for business users to answer adhoc business questions and derive critical insights. The data lake is based on open source Hadoop big data platform and a very cost effective solution in breaking the ERP data silos and simplifying the data architecture in the enterprise.
POCs were conducted on in-house Hortonworks Hadoop data platform to validate the cluster performance for Production volumes. Based on business priorities, an initial roadmap was defined using 3 data sources including 2 SAP ERPs and Peoplesoft (OLTP systems). Development environment was established in AWS Cloud for agile delivery. The near real time data ingestion architecture for the data lake was defined using replication tools and custom SQOOP based micro-batching framework and data persisted in Apache Hive DB in ORC format. Data and user security is implemented using Apache Ranger and sensitive data stored at rest in encryption zones. Business data sets were developed in Hive scripts and scheduled using Oozie. Multiple reporting tools connectivity including SQL tools, Excel and Tableau were enabled for Self-service Analytics. Upon successful implementation of the initial phase, a full roadmap is established to extend the Finance data lake to over 25 data sources and enhance data ingestion to scale as well as enable OLAP tools on Hadoop.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
As the enterprise's big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat's proven platform technologies. Join us in this interactive 3-part webinar series, as we'll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Hortonworks and Clarity Solution Group Hortonworks
Many organizations are leveraging social media to understand consumer sentiment and opinions about brands and products. Analytics in this area, however, is in its infancy and does not always provide a compelling result for effective business impact. Learn how consumer organizations can benefit by integrating social data with enterprise data to drive more profitable consumer relationships. This webinar is presented by Hortonworks and Clarity Solution Group, and will focus on the evolution of Hadoop, the clear advantage of Hortonworks distribution, and business challenges solved by “Consumer720.”
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Presentation from Data Science Conference 2.0 held in Belgrade, Serbia. The focus of the talk was to address the challenges of deploying a Data Lake infrastructure within the organization.
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
An organization’s information is spread across multiple repositories, on-premise and in the cloud, with limited ability to correlate information and derive insights. The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization.
- Leverage 100% of your data: Text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven (powered by HP IDOL and HP Vertica), making it possible to integrate this valuable content and insights into various line of business applications.
- Democratize and enable multi-dimensional content analysis: - Empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease, using the 100% open source Hortonworks Data Platform.
- Extend the enterprise data warehouse: Synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in.
- Dramatically reduce complexity with enterprise-ready SQL engine: Tap into the richest analytics that support JOINs, complex data types, and other capabilities only available with HP Vertica SQL on the Hortonworks Data Platform.
Speakers:
- Ajay Singh, Director, Technical Channels, Hortonworks
- Will Gardella, Product Management, HP Big Data
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
The Next Generation of Big Data AnalyticsHortonworks
Apache Hadoop has evolved rapidly to become a leading platform for managing and processing big data. If your organization is examining how you can use Hadoop to store, transform, and refine large volumes of multi-structured data, please join us for this session where we will discuss, the emergence of "big data" and opportunities for deriving business value, the evolution of Apache Hadoop and future directions, essential components required in a Hadoop-powered platform, and solution architectures that integrate Hadoop with existing data discovery and data warehouse platforms.
Hadoop based data Lakes have become increasingly popular within today’s modern data architectures for their ability to scale, handle data variety and low cost. Many organizations start slow with the data lake initiatives but as they grow bigger, they suffer with challenges on data consistency, quality and security, resulting in losing confidence in their data lake initiatives.
This talk will discuss the need for good data governance mechanisms for Hadoop data lakes and it relationship with productivity and how it helps organizations meet regulatory and compliance requirements. The talk advocates carrying a different mindset for designing and implementing flexible governance mechanisms on Hadoop data lakes.
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
The business and technology teams within a health insurer must align the company’s central data platform with its data strategy. That requires substantial organizational alignment. Hear the firsthand perspective from Health Care Service Corporation (HCSC), the largest customer-owned health insurance company in the United States. The speaker will cover how they integrated membership information, regulatory compliance, and the general ledger, to improve overall healthcare management. At HCSC, the strong alignment between executive leadership, business portfolio direction, architectural strategy, technology delivery, and program management have helped create leading-edge capabilities which help the company respond nimbly to a quickly evolving healthcare industry.
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
Today’s enterprises are challenged with capturing large amounts of data from a number of sources in a variety of formats, and then storing it in a cost-effective, timely manner. With your current data warehouse, this may seem overwhelming. It doesn’t have to be. With a Hadoop-based modern data warehouse, you can overcome these challenges and get meaningful insights from real-time data.
Want to learn how? Join experts from Attunity, Hortonworks, and RCG Global Services for a live webinar - where we will be discussing enterprise data warehouse optimization. You will learn how to:
•Rebalance your data warehouse by identifying unused data and resource-intensive workloads that can be moved to Hadoop.
•Seamlessly integrate your current enterprise data warehouse with a Modern Data Architecture.
•Better utilize data assets to reduce costs while realizing more value from your data.
•Develop a roadmap for implementing the Hadoop-based Modern Data Architecture and Data Lake.
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
Building the Enterprise Data Lake: A look at architecturemark madsen
The topic is building an Enterprise Data Lake, discussing high level data and technology architecture. We will describe the architecture of a data warehouse, how a data lake needs to differ, and show a high level functional and data architecture for a data lake. This webinar will cover:
Why dumping data into Hadoop and letting users get it out doesn't work
The difference between a Hadoop application and a Data Lake
Why new ideas about data architecture are a key element
An Enterprise Data Lake reference architecture to frame what must be built
Trafodion – an enterprise class sql based on hadoopKrishna-Kumar
Trafodion is a joint HP Labs and HP-IT research project to develop an enterprise-class SQL on Hadoop DBMS engine that specifically targets operational workloads as opposed to analytic workloads. Operational SQL describe workloads previous described as OLTP (online transaction processing) workloads and Operational Data Store (ODS) workloads, but expands that definition from the broad range of enterprise-level transactional applications (ERP, CRM, etc.) to include the new transactions generated from social and mobile data interactions and observations and the new mixing of structured and semi-structured data.
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Presentation from Data Science Conference 2.0 held in Belgrade, Serbia. The focus of the talk was to address the challenges of deploying a Data Lake infrastructure within the organization.
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
An organization’s information is spread across multiple repositories, on-premise and in the cloud, with limited ability to correlate information and derive insights. The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization.
- Leverage 100% of your data: Text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven (powered by HP IDOL and HP Vertica), making it possible to integrate this valuable content and insights into various line of business applications.
- Democratize and enable multi-dimensional content analysis: - Empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease, using the 100% open source Hortonworks Data Platform.
- Extend the enterprise data warehouse: Synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in.
- Dramatically reduce complexity with enterprise-ready SQL engine: Tap into the richest analytics that support JOINs, complex data types, and other capabilities only available with HP Vertica SQL on the Hortonworks Data Platform.
Speakers:
- Ajay Singh, Director, Technical Channels, Hortonworks
- Will Gardella, Product Management, HP Big Data
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
The Next Generation of Big Data AnalyticsHortonworks
Apache Hadoop has evolved rapidly to become a leading platform for managing and processing big data. If your organization is examining how you can use Hadoop to store, transform, and refine large volumes of multi-structured data, please join us for this session where we will discuss, the emergence of "big data" and opportunities for deriving business value, the evolution of Apache Hadoop and future directions, essential components required in a Hadoop-powered platform, and solution architectures that integrate Hadoop with existing data discovery and data warehouse platforms.
Hadoop based data Lakes have become increasingly popular within today’s modern data architectures for their ability to scale, handle data variety and low cost. Many organizations start slow with the data lake initiatives but as they grow bigger, they suffer with challenges on data consistency, quality and security, resulting in losing confidence in their data lake initiatives.
This talk will discuss the need for good data governance mechanisms for Hadoop data lakes and it relationship with productivity and how it helps organizations meet regulatory and compliance requirements. The talk advocates carrying a different mindset for designing and implementing flexible governance mechanisms on Hadoop data lakes.
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
The business and technology teams within a health insurer must align the company’s central data platform with its data strategy. That requires substantial organizational alignment. Hear the firsthand perspective from Health Care Service Corporation (HCSC), the largest customer-owned health insurance company in the United States. The speaker will cover how they integrated membership information, regulatory compliance, and the general ledger, to improve overall healthcare management. At HCSC, the strong alignment between executive leadership, business portfolio direction, architectural strategy, technology delivery, and program management have helped create leading-edge capabilities which help the company respond nimbly to a quickly evolving healthcare industry.
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
Today’s enterprises are challenged with capturing large amounts of data from a number of sources in a variety of formats, and then storing it in a cost-effective, timely manner. With your current data warehouse, this may seem overwhelming. It doesn’t have to be. With a Hadoop-based modern data warehouse, you can overcome these challenges and get meaningful insights from real-time data.
Want to learn how? Join experts from Attunity, Hortonworks, and RCG Global Services for a live webinar - where we will be discussing enterprise data warehouse optimization. You will learn how to:
•Rebalance your data warehouse by identifying unused data and resource-intensive workloads that can be moved to Hadoop.
•Seamlessly integrate your current enterprise data warehouse with a Modern Data Architecture.
•Better utilize data assets to reduce costs while realizing more value from your data.
•Develop a roadmap for implementing the Hadoop-based Modern Data Architecture and Data Lake.
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
Building the Enterprise Data Lake: A look at architecturemark madsen
The topic is building an Enterprise Data Lake, discussing high level data and technology architecture. We will describe the architecture of a data warehouse, how a data lake needs to differ, and show a high level functional and data architecture for a data lake. This webinar will cover:
Why dumping data into Hadoop and letting users get it out doesn't work
The difference between a Hadoop application and a Data Lake
Why new ideas about data architecture are a key element
An Enterprise Data Lake reference architecture to frame what must be built
Trafodion – an enterprise class sql based on hadoopKrishna-Kumar
Trafodion is a joint HP Labs and HP-IT research project to develop an enterprise-class SQL on Hadoop DBMS engine that specifically targets operational workloads as opposed to analytic workloads. Operational SQL describe workloads previous described as OLTP (online transaction processing) workloads and Operational Data Store (ODS) workloads, but expands that definition from the broad range of enterprise-level transactional applications (ERP, CRM, etc.) to include the new transactions generated from social and mobile data interactions and observations and the new mixing of structured and semi-structured data.
Hortonworks and Red Hat Webinar - Part 2Hortonworks
Learn more about creating reference architectures that optimize the delivery the Hortonworks Data Platform. You will hear more about Hive, JBoss Data Virtualization Security, and you will also see in action how to combine sentiment data from Hadoop with data from traditional relational sources.
The world’s largest enterprises run their infrastructure on Oracle, DB2 and SQL and their critical business operations on SAP applications. Organisations need this data to be available in real-time to conduct necessary analytics. However, delivering this heterogeneous data at the speed it’s required can be a huge challenge because of the complex underlying data models and structures and legacy manual processes which are prone to errors and delays.
Unlock these silos of data and enable the new advanced analytics platforms by attending this session.
Find out how to:
• To overcome common challenges faced by enterprises trying to access their SAP data
• You can integrate SAP data in real-time with change data capture (CDC) technology
• Organisations are using Attunity Replicate for SAP to stream SAP data in to Kafka
The Briefing Room with William McKnight and Actian
Live Webcast on October 14, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=135528d85baa96a07850bd35961d459d
Integrating Hadoop with existing data sources, workflows and analytics can be a real challenge. While some components, like Hive and Spark, can give SQL access to Hadoop data, there isn’t much that enables Hadoop to be treated as a genuine BI and analytics platform, capable of running multiple jobs that serve multiple users and multiple applications. But what if you could turn Hadoop into a versatile, high performance development platform, forgoing all the pain of figuring out how and where to manage big data?
Register for this episode of The Briefing Room to hear veteran Analyst William McKnight as he discusses the fairly swift evolution of Hadoop’s capabilities. He’ll be briefed by Jim Hare of Actian, who will tout his company’s latest addition to its Analytic Platform: Hadoop SQL Edition. He will show how Actian has leveraged Hadoop and its scale out file system to create a fully functioning platform, providing everything from an analytic database to machine learning.
Visit InsideAnlaysis.com for more information.
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
How Hortonworks DataFlow (HDF), powered by Apache NIFi, MiNiFi, Kafka and Storm, and it’s associated HDF Certification Program make it easier and faster to integrate different systems together. Highlights on the latest partner integrations from HPE, SAS, Attunity, Impetus Technologies, Kepware and Midfin Systems. “
Watch the webinar on-demand: http://hortonworks.com/webinar/make-big-data-ecosystem-work-better/
HDF Partner certification program: http://hortonworks.com/partners/product-integration-certification/#hdf-integration
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
Earlier this year, the Apache open source community delivered the Stinger Initiative to improve speed, scale and SQL semantics in Apache Hive. Now Stinger.next is underway, to build on those initial successes.
In this presentation, from a webinar hosted by Hortonworks co-founder Alan Gates and Hortonworks Hive product manager Raj Baines, you can learn more about Stinger.next and innovation in Apache Hive.
Alan and Raj cover new Hive functionality for more speed, scale and SQL in HDP 2.2. Specific topics include transactions with ACID semantics, the cost based optimizer and dynamic query optimizations.
The presentation also shows future plans for the Stinger.next initiative.
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
Hortonworks Data Platform 2.2 include HDFS for data storage . In this 30-minute webinar, we discussed data storage innovations, including Heterogeneous storage, encryption, and operational security enhancements.
These slides to the Discover HDP 2.2 Webinar Series: Data Storage Innovations in HDFS explore Heterogeneous storage, Data Encryption and Operational security.
This is the presentation from the "Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS" webinar on May 28, 2014. Rohit Bahkshi, a senior product manager at Hortonworks, and Vinod Vavilapalli, PMC for Apache Hadoop, discuss an overview of YARN in HDFS and new features in HDP 2.1. Those new features include: HDFS extended ACLs, HTTPs wire encryption, HDFS DataNode caching, resource manager high availability, application timeline server, and capacity scheduler pre-emption.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture?
Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn:
How to grow a pilot into production
How to scale-out architecture & systems affordably
How to leverage the flexibility of Hadoop to optimize your data integration processes
Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
Companies in every industry look for ways to explore new data types and large data sets that were previously too big to capture, store and process. They need to unlock insights from data such as clickstream, geo-location, sensor, server log, social, text and video data. However, becoming a data-first enterprise comes with many challenges.
Join this webinar organized by three leaders in their respective fields and learn from our experts how you can accelerate the implementation of a scalable, cost-efficient and robust Big Data solution. Cisco, Hortonworks and Red Hat will explore how new data sets can enrich existing analytic applications with new perspectives and insights and how they can help you drive the creation of innovative new apps that provide new value to your business.
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around HP and Hortonworks Data Platform to get you started on building your modern data architecture.
Learn how to:
- Leverage best practices for deployment
- Choose a deployment model
- Design your Hadoop cluster
- Build a Modern Data Architecture and vision for the Data Lake
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
This is an architecture of a Business Data Lake. It is centered around Hadoop-based storage. It includes tools and components for ingesting data from different kinds of data sources, processing data for analytics and insights, and for supporting applications that utilize data, implement insights, and contribute data back to the data lake. In this presentation, we will look at the various components of a business data lake architecture, and show how to use it to maximize the value of your company’s data.
Let’s first look at why Hadoop and HDFS for a data lake makes a lot of sense.
Hadoop, and its underlying Hadoop File System, or HDFS, is a distributed file system that supports arbitrarily large clusters. This means your data storage can theoretically be as large as needed to fit your needs. You simply add more clusters as you need more space.
HDFS is schema-less, which means it can support files of any type and format. This is great for storing unstructured or semi-structured data, as well as non relational data formats such as binary streams from sensors, image data, machine logging. It’s also just fine for storing for structured, relational tabular data.
When your data storage can take any kind of data from any kind of source, allowing this data to be loaded and stored can be a challenge. This is why a wide selection of tools for ingest is needed to implement a data lake.
Batch loading can be achieved with a variety of tools, depending on additional sources needed.
Sqoop, for example, is great for handling large data batch loading, and can even pull data from legacy databases.
On the other hand, if your bulk loading operation needs some additional processing on it – such as you want to transform data from one format to another or create metadata, and if you want to be able to create analytics, then another open source tool, Spring XD, is available and provides scale and flexibility to handle your specific needs.
Microbatch – in other words, smaller, but recurring batch loads, such as data change deltas or event-triggered updates, is handled well by Flume.
Storing high-velocity data into Hadoop is a different challenge altogether. Considering that your source could be in any volume in addition to speed. If ensuring you store all the data is paramount, you need tools that can capture and queuedata in any scale or volume until the Hadoop cluster is able to store.
A data lake based on Pivotal Big Data Suite has two tools built for these use cases. In fact they can work together:
Spring XD can scale to handle data streaming at real time, and provide the same capabilities of processing and analyzing.
Pivotal GemFire XD can work with Spring XD to provide advanced database operations such search for duplicates in a window of time, for example, and allows you to ensure consistency of data in writes. Since it it’s a SQL-based database, it’s also great for helping convert or add structure to ingested data.
Once you have the ability to store and load data into your data lake, the next is deriving business value by processing, gaining insights, and taking action on the data.
It’s great that one can can get any kind of data into an HDFS data store. However, to be able to conduct advanced analytics on it, you often need to make it accessible to structured-based analysis tools.
This kind of processing may involve direct transformation of file types, or it might simply mean analyzing and creating meta data about the file type.
This can be done on ingest with some of the tools described, or can be processed after being stored in Hadoop.
Examples might be transforming binary image formats into RDBMS tables to enable large scale image processing, or even simple ETL processes on web logs so that it can later be turned into fact tables.
Once you have structure applied to your data, its possible to leverage SQL-based tools to do fast processing on your data for advanced analytics and data science. Only HAWQ provides full analytic SQL support on Hadoop in massive parallel processing. This allows you enjoy very high performance leveraging advanced analytics functions in MADlib, as well as when using analytics applications such as SAS.
With structure applied to your data, and the ability to deploy advanced analytics, now you can start doing some very powerful investigation, which is actually supported by Hadoop.
By discovering relationships between otherwise seemingly unrelated data sets, its possible to discover correlations and potential causation, and create multi-dimensional analytical models that have higher precision in predictive analytics.
Since HDFS allows you to store as much data as you want at a very cheap price, its possible to store larger detail data sets such as time series feeds, and application logs. In traditional data warehousing, ETL processes will aggregate and summarize this information, and lose detail for purposes of facilitating reporting. By saving the detail, its possible to run machine learning algorithms on the data to help build more accurate predictive analytics.
Distributed in-memory databases such as Pivotal GemFire XD make it possible to deploy real-time data-driven automation at scale. This means you can deploy applications for responding to and processing incoming streaming data such as for Internet of Things applications, or support large scale mobile-web applications. You want to create intelligent user experiences, and provide smart automation and processing in the backend. You also want to be able to capture and store detailed logging of all interactions for further analysis.
The ability to deploy automation at scale, capture and store all data, and analyze to discover insights and algorithms is an ongoing process of continuous improvement and innovation.
Using the full capabilities of a data lake from storing massive data sets to achieving coninuous innovation allows your company to maximize the business value it generates off its data.