Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
Introduction to Hortonworks Data Platform for WindowsHortonworks
According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.
In less than an hour, you’ll learn:
-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
There certainly is no shortage of hype when it comes to the term “Big Data”. One thing we can be sure of is that massive data volumes are driving a new modern data architecture that includes Hadoop in the mix. But what does that architecture look like for Business Intelligence Data Strategy?
Join Hortonworks and MicroStrategy, where we’ll:
• Discuss the modern architecture for Business Intelligence on top of Hadoop as a data source.
• Learn how our joint solution helps enterprises store, process and analyze vast amounts of structured and unstructured data to deliver business insights throughout an organization.
• Discover what new benefits Hadoop 2.0 offers and how the MicroStrategy Analytics platform leverages those new features to improve performance, achieve faster access times, and allow for true interactive visual data discovery.
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
Introduction to Hortonworks Data Platform for WindowsHortonworks
According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.
In less than an hour, you’ll learn:
-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
There certainly is no shortage of hype when it comes to the term “Big Data”. One thing we can be sure of is that massive data volumes are driving a new modern data architecture that includes Hadoop in the mix. But what does that architecture look like for Business Intelligence Data Strategy?
Join Hortonworks and MicroStrategy, where we’ll:
• Discuss the modern architecture for Business Intelligence on top of Hadoop as a data source.
• Learn how our joint solution helps enterprises store, process and analyze vast amounts of structured and unstructured data to deliver business insights throughout an organization.
• Discover what new benefits Hadoop 2.0 offers and how the MicroStrategy Analytics platform leverages those new features to improve performance, achieve faster access times, and allow for true interactive visual data discovery.
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
Is Hadoop ready for high-concurrency complex BI and Advanced Analytics? Roaring performance and fast, low-latency execution is possible when an in-memory analytical platform is paired with the Apache Hadoop framework. Join Hortonworks and Kognitio for an informative Web Briefing on putting Hadoop at the center of your modern data architecture—with zero disruption to business users.
Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman
Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
Joint webinar with Microsoft and Hortonworns on the power of combining the Hortonworks Data Platform with Microsoft’s ubiquitous Windows, Office, SQL Server, Parallel Data Warehouse, and Azure platform to build the Modern Data Architecture for Big Data.
As more organizations look to Hadoop as the technology solution for big data analytics, common questions arise.
Join us in this case study look at an online services provider's experience with Big Data and how they answered the questions:
*What does big data analytics do that my existing BI software doesn’t?
*Will Hadoop replace my data warehouse?
*What about Hive?
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
SpringOne Platform 2017
Milind Bhandarkar, Ampool
"To provide hyper-personalized digital experiences in the emerging market transformation, innovative enterprises are building modern data-driven applications to deliver continuing value to their always-connected customers. Such applications need to utilize closed-loop deep insights to influence their users' behaviors in real-time. However, the traditional ways of capturing users' interactions, transporting data to large data warehouses or data lakes, further away from applications, and processing these data across multiple slow stages cannot meet the real-time expectations of both customers and businesses.
What if one could capture, analyze, and serve data from a highly concurrent, high-performance data store powering these applications? In this talk, we'll present a memory-centric Active Data Store (ADS), powered by Apache Geode, to meet the exigent demands of modern applications while providing operational simplicity. Ampool's ADS allows fast ingest and storage of 'hot' app data, in situ updates and analysis, and data serving from the same scalable distributed in-memory data store. As the data cools (ages), Ampool ADS automatically tiers data to warm and cold secondary stores. By speeding analytics several-fold, Ampool enables feeding actionable insights back to applications, driving decisions in a closed loop.
We will demonstrate the applicability of Ampool ADS for such an app by serving all data-access patterns from a single memory-centric store."
Video: http://www.youtube.com/watch?v=BT8WvQMMaV0
Hadoop is the technology of choice for processing large data sets. At salesforce.com, we service internal and product big data use cases using a combination of Hadoop, Java MapReduce, Pig, Force.com, and machine learning algorithms. In this webinar, we will discuss an internal use case and a product use case:
Product Metrics: Internally, we measure feature usage using a combination of Hadoop, Pig, and the Force.com platform (Custom Objects and Analytics).
Community-Based Recommendations: In Chatter, our most successful people and file recommendations are built on a collaborative filtering algorithm that is implemented on Hadoop using Java MapReduce.
Simplified Data Management And Process Scheduling in HadoopGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Proper data management and process scheduling are challenges that many data-driven companies under-prioritize. Although it might not cause troubles in short run, it becomes a nightmare when your cluster grows. However, even when you realize this problem, you might not see that possible solutions are so close... In this talk, we share how we simplified our data management and process scheduling in Hadoop with useful (but less adopted) open-source tools. We describe how Falcon, HCatalog, Avro, HDFS FsImage, CLI tools and tricks helped us to address typical problems related to orchestration of data pipelines and discovery, retention, lineage of datasets.
This presentation accompanied a practical demonstration of Amazon's Elastic Computing services to CNET students at the University of Plymouth on 16/03/2010.
The practical demonstration involved an obviously parallel problem split on 5 Medium size AMIs. The problem was the calculation of the Clustering Coefficient and the Mean Path Length (Based on the original work done by Watts and Strogatz) for large networks. The code was written in Python taking advantage of the scipy, pyparallel and networkx toolkits
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
Is Hadoop ready for high-concurrency complex BI and Advanced Analytics? Roaring performance and fast, low-latency execution is possible when an in-memory analytical platform is paired with the Apache Hadoop framework. Join Hortonworks and Kognitio for an informative Web Briefing on putting Hadoop at the center of your modern data architecture—with zero disruption to business users.
Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman
Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
Joint webinar with Microsoft and Hortonworns on the power of combining the Hortonworks Data Platform with Microsoft’s ubiquitous Windows, Office, SQL Server, Parallel Data Warehouse, and Azure platform to build the Modern Data Architecture for Big Data.
As more organizations look to Hadoop as the technology solution for big data analytics, common questions arise.
Join us in this case study look at an online services provider's experience with Big Data and how they answered the questions:
*What does big data analytics do that my existing BI software doesn’t?
*Will Hadoop replace my data warehouse?
*What about Hive?
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
SpringOne Platform 2017
Milind Bhandarkar, Ampool
"To provide hyper-personalized digital experiences in the emerging market transformation, innovative enterprises are building modern data-driven applications to deliver continuing value to their always-connected customers. Such applications need to utilize closed-loop deep insights to influence their users' behaviors in real-time. However, the traditional ways of capturing users' interactions, transporting data to large data warehouses or data lakes, further away from applications, and processing these data across multiple slow stages cannot meet the real-time expectations of both customers and businesses.
What if one could capture, analyze, and serve data from a highly concurrent, high-performance data store powering these applications? In this talk, we'll present a memory-centric Active Data Store (ADS), powered by Apache Geode, to meet the exigent demands of modern applications while providing operational simplicity. Ampool's ADS allows fast ingest and storage of 'hot' app data, in situ updates and analysis, and data serving from the same scalable distributed in-memory data store. As the data cools (ages), Ampool ADS automatically tiers data to warm and cold secondary stores. By speeding analytics several-fold, Ampool enables feeding actionable insights back to applications, driving decisions in a closed loop.
We will demonstrate the applicability of Ampool ADS for such an app by serving all data-access patterns from a single memory-centric store."
Video: http://www.youtube.com/watch?v=BT8WvQMMaV0
Hadoop is the technology of choice for processing large data sets. At salesforce.com, we service internal and product big data use cases using a combination of Hadoop, Java MapReduce, Pig, Force.com, and machine learning algorithms. In this webinar, we will discuss an internal use case and a product use case:
Product Metrics: Internally, we measure feature usage using a combination of Hadoop, Pig, and the Force.com platform (Custom Objects and Analytics).
Community-Based Recommendations: In Chatter, our most successful people and file recommendations are built on a collaborative filtering algorithm that is implemented on Hadoop using Java MapReduce.
Simplified Data Management And Process Scheduling in HadoopGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Proper data management and process scheduling are challenges that many data-driven companies under-prioritize. Although it might not cause troubles in short run, it becomes a nightmare when your cluster grows. However, even when you realize this problem, you might not see that possible solutions are so close... In this talk, we share how we simplified our data management and process scheduling in Hadoop with useful (but less adopted) open-source tools. We describe how Falcon, HCatalog, Avro, HDFS FsImage, CLI tools and tricks helped us to address typical problems related to orchestration of data pipelines and discovery, retention, lineage of datasets.
This presentation accompanied a practical demonstration of Amazon's Elastic Computing services to CNET students at the University of Plymouth on 16/03/2010.
The practical demonstration involved an obviously parallel problem split on 5 Medium size AMIs. The problem was the calculation of the Clustering Coefficient and the Mean Path Length (Based on the original work done by Watts and Strogatz) for large networks. The code was written in Python taking advantage of the scipy, pyparallel and networkx toolkits
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
Learning Objectives - In this module, you will learn the Hadoop Cluster Architecture and Setup, Important Configuration files in a Hadoop Cluster, Data Loading Techniques.
I have studied on Big Data analysis and found Hadoop is the best technology and most popular as well for it's distributed data processing approaches. I have gathered all possible information about various Hadoop distributions available in the market and tried to describe most important tools and their functionality in the Hadoop echosystems in this slide show. I have also tried to discuss about connectivity with language R interm of data analysis and visualization perspective. Hope you will be enjoying the whole!
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
This talk will describe how Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. Romain will cover details on how Hue can leverage the existing authentication system and security model of your company. He will also cover the Hive/Shark/Pig/Oozie best practice setup for Hue.
http://www.meetup.com/hadoop/events/125191612/
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
Many people refer to Apache Hadoop as their system of choice for big data management but few actually use just Apache Hadoop. Hadoop has become a proxy for a much larger system which has HDFS storage at its core. The Apache Hadoop based "big data stack" has changed dramatically over the past 24 months and will chance even more over the next 24 months. This talk talks about trends in the evolution of the Hadoop stack, change in architecture and changes in the kinds of use cases that are supported. It will also talk about the role of interoperability and cohesion in the Apache Hadoop stack and the role of Apache Bigtop in this regard.
Data ingest is a deceptively hard problem. In the world of big data processing, it becomes exponentially more difficult. It's not sufficient to simply land data on a system, that data must be ready for processing and analysis. The Kite SDK is a data API designed for solving the issues related to data infest and preparation. In this talk you'll see how Kite can be used for everything from simple tasks to production ready data pipelines in minutes.
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
An organization’s information is spread across multiple repositories, on-premise and in the cloud, with limited ability to correlate information and derive insights. The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization.
- Leverage 100% of your data: Text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven (powered by HP IDOL and HP Vertica), making it possible to integrate this valuable content and insights into various line of business applications.
- Democratize and enable multi-dimensional content analysis: - Empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease, using the 100% open source Hortonworks Data Platform.
- Extend the enterprise data warehouse: Synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in.
- Dramatically reduce complexity with enterprise-ready SQL engine: Tap into the richest analytics that support JOINs, complex data types, and other capabilities only available with HP Vertica SQL on the Hortonworks Data Platform.
Speakers:
- Ajay Singh, Director, Technical Channels, Hortonworks
- Will Gardella, Product Management, HP Big Data
Are you confused by Big Data? Get in touch with this new "black gold" and familiarize yourself with undiscovered insights through our complimentary introductory lesson on Big Data and Hadoop!
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
In this one-hour webinar, Caserta Concepts and Talend described an approach to achieve an architectural framework and roadmap to extend a traditional enterprise data warehouse environment, into a Big Data ecosystem.
They illustrated the architectural components involved for collecting, analyzing and delivering Big Data, with a focus on the importance of Hadoop, Data Integration, Machine Learning, NoSQL, Business Intelligence and Analytics.
Attendees learned:
Which Big Data technologies can’t be ignored
Considerations when extending the data ecosystem
What happens to your existing investment
What are the points of integration
Does Big Data = better data?
To find access the recorded webinar or to learn more, visit http://www.casertaconcepts.com/.
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
Financial services companies can reap tremendous benefits from 'Big Data' and they have moved quickly to deploy it. But these companies also place heavy demands on 'Big Data' infrastructure for flexibility, reliability and performance. In this webinar, Hortonworks joins WANDisco to look at three examples of using 'Big Data' to get a more comprehensive view of customer behavior and activity in the banking and insurance industries. Then we'll pull out the common threads from these examples, and see how a flexible next-generation Hadoop architecture lets you get a step up on improving your business performance. Join us to learn:
How to leverage data from across an entire global enterprise
How to analyze a wide variety of structured and unstructured data to get quick, meaningful answers to critical questions
What industry leaders have put in place
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
In today’s world of exponentially growing big data, enterprises are becoming increasingly more aware of the business utility and necessity of harnessing, storing and analyzing this information. Apache Hadoop has rapidly evolved to become a leading platform for managing and processing big data, with the vital management, monitoring, metadata and integration services required by organizations to glean maximum business value and intelligence from their burgeoning amounts of information on customers, web trends, products and competitive markets. In this session, Hortonworks' Himanshu Bari will discuss the opportunities for deriving business value from big data by looking at how organizations utilize Hadoop to store, transform and refine large volumes of this multi-structured information. Connolly will also discuss the evolution of Apache Hadoop and where it is headed, the component requirements of a Hadoop-powered platform, as well as solution architectures that allow for Hadoop integration with existing data discovery and data warehouse platforms. In addition, he will look at real-world use cases where Hadoop has helped to produce more business value, augment productivity or identify new and potentially lucrative opportunities.
August 2018 version of my "What does rename() do", includes the full details on what the Hadoop MapReduce and Spark commit protocols are, so the audience will really understand why rename really, really matters
Put is the new rename: San Jose Summit EditionSteve Loughran
This is the June 2018 variant of the "Put is the new Rename Talk", looking at Hadoop stack integration with object stores, including S3, Azure storage and GCS.
The lessons from implementing a twitter bot designed to live on a raspberry pi and heckle politicians —and deployed into production in the 2017 UK General Election
A review of the state of cloud store integration with the Hadoop stack in 2018; including S3Guard, the new S3A committers and S3 Select.
Presented at Dataworks Summit Berlin 2018, where the demos were live.
Berlin Buzzwords 2017 talk: A look at what our storage models, metaphors and APIs are, showing how we need to rethink the Posix APIs to work with object stores, while looking at different alternatives for local NVM.
This is the unabridged talk; the BBuzz talk was 20 minutes including demo and questions, so had ~half as many slides
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
A talk looking at the intricate details of working with an object store from Hadoop, Hive, Spark, etc, why the "filesystem" metaphor falls down, and what work myself and others have been up to to try and fix things
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
The March 2017 version of the "Apache Spark and Object Stores", includes coverage of the Staging Committer. If you'd been at the talk you'd have seen the projector fail just before the demo. It worked earlier! Honest!
Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it's applications don't integrate that well —something which starts right down at the file IO operations. This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the foundational "what's an object store?" to the practical "what should I avoid" and the timely "what's new in Hadoop?" — the latter covering the improved S3 support in Hadoop 2.8+. I'll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code —and equally, what they must avoid. Finally, I'll look at ongoing work, especially "S3Guard" and what its fast and consistent file metadata operations promise.
My talk from Berlin Buzzwords 2016, looking at whether it is actually possible to lock down a household to the extent you could call it "secure". I also try to highlight that we need to consider "privacy" alongside security.
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
An update of the "Hadoop and Kerberos: the Madness Beyond the Gate" talk, covering recent work "the Fix Kerberos" JIRA and its first deliverable: KDiag
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
23. Yahoo! Homepage
• Serving Maps SCIENCE » Machine learning to build ever
• Users - Interests HADOOP better categorization models
CLUSTER
• Five Minute CATEGORIZATION
USER
Production BEHAVIOR MODELS (weekly)
• Weekly PRODUCTION
Categorization HADOOP » Identify user interests using
CLUSTER
models SERVING Categorization models
MAPS
(every 5 minutes)
USER
BEHAVIOR
SERVING SYSTEMS ENGAGED USERS
Build customised home pages with latest data (thousands / second)
Copyright Yahoo 2011 25
24. Conclusions
Hadoop can live alongside existing BI
systems –as a data refinery
• Store, refine bulk & unstructured data
• Archive data for long-term analysis
• Support ad-hoc queries over bulk data
• Become the data-science platform
26
In the graphic above, Apache Hadoop acts as the Big Data Refinery. It’s great at storing, aggregating, and transforming multi-structured data into more useful and valuable formats.Apache Hive is a Hadoop-related component that fits within the Business Intelligence & Analytics category since it is commonly used for querying and analyzing data within Hadoop in a SQL-like manner. Apache Hadoop can also be integrated with other EDW, MPP, and NewSQL components such as Teradata, Aster Data, HP Vertica, IBM Netezza, EMC Greenplum, SAP Hana, Microsoft SQL Server PDW and many others.Apache HBase is a Hadoop-related NoSQL Key/Value store that is commonly used for building highly responsive next-generation applications. Apache Hadoop can also be integrated with other SQL, NoSQL, and NewSQL technologies such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM DB2, MongoDB, DynamoDB, MarkLogic, Riak, Redis, Neo4J, Terracotta, GemFire, SQLFire, VoltDB and many others.Finally, data movement and integration technologies help ensure data flows seamlessly between the systems in the above diagrams; the lines in the graphic are powered by technologies such as WebHDFS, Apache HCatalog, Apache Sqoop, Talend Open Studio for Big Data, Informatica, Pentaho, SnapLogic, Splunk, Attunity and many others.
At the highest level, I describe three broad areas of data processing and outline how these areas interconnect.The three areas are:1.Business Transactions & Interactions2. Business Intelligence & Analytics3. Big Data RefineryThe graphic illustrates a vision for how these three types of systems can interconnect in ways aimed at deriving maximum value from all forms of data.Enterprise IT has been connecting systems via classic ETL processing, as illustrated in Step 1 above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions.The “Big Data Refinery”, as highlighted in Step 2, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers.More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.The Big Data Refinery platform provides fertile ground for new types of tools and data processing workloads to emerge in support of rich multi-level data refinement solutions.With that as backdrop, Step 3 takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions & Interactions and Business Intelligence & Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.By directly integrating the Big Data Refinery with existing Business Intelligence & Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to more accurately understand the customer behaviors that lead to the transactions.Moreover, systems focused on Business Transactions & Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in Steps 4 and 5. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.
Real world data is 'dirty' -you need to clean it upExamples: merge multiple events into one of an extended periodSanity check events against your world view (how fast things move, how much things cost). There is much danger here.text cleanup, discard empty fieldsYou may still want to retain the original data to see what was filtered -at the very least log & sample the outliers
This is taking a metaphor beyond the limits: all that comes next is photos of Grangemout or Milford Haven.Real world refineries have giant storage tanks to buffer differences between ingress and egress rates.Here we are proposing keeping data near the refinery
RCFile (Record Columnar File)http://en.wikipedia.org/wiki/RCFileHCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of polyglot persistence. Basically, what that comes down to is that you pick the right tool for the job. In the Hadoop ecosystem, you have many tools that might be used for data processing - you might use Pig or Hive, or your own custom MapReduce program, or that shiny new GUI-based tool that's just come out. And which one to use might depend on the user, or on the type of query you're interested in, or the type of job we want to run. From another perspective, you might want to store your data in columnar storage for efficient storage and retrieval for particular query types, or in text so that users can write data producers in scripting languages like Perl or Python, or you may want to hook up that HBase table as a data source. As a end-user, I want to use whatever data processing tool is available to me. As a data designer, I want to optimize how data is stored. As a cluster manager/data architect, I want the ability to share pieces of information across the board, and move data back and forth fluidly. HCatalog's hopes and promises are the realization of all of the above.
This is an example that's gone up our web site recently, using Pig to analyse NetFlow packets and so look for origins over time. That's the kind of thing you can only do with large datasets. Using a language like Pig helps you look at the numbers and decide what the next questions to ask are.
This is important. once you start becoming more aware of your customers, your potential customers, your internal state and the world outside -you have more information than ever before.Yet you still need to analyse it.
Conducting valid experiments: A/B testing of two different options must be conducted truly at random, to avoid selection bias or influence by external factorsAccepting negative results: It's OK to have an outcome that says "neither option is any better or worse than the other"Accepting results you don't agree with: evidence your idea doesn't work. no 3, is hard -and why you need large, valid sample sets. Otherwise you could dismiss it as a bad experiment. Governments are classic examples of organisations that don't do this. Badger Culling and Drug Policies are key examples -policy is driven by the belief of constituencies (farmers, daily mail), rather than recognising the evidence and trying to explain to the constituencies that they are mistaken. This isn't a critique of the current administration -the previous one was also belief-driven rather than fact-driven.