How DataCore software radically decreased 9-11 Emergency Communications of Southern Oregon call center response times with 20-40x improvements. DataCore's 10/26/17 Advantech Solution Day presentation by Sushant Rao.
Event details: http://www.advantech-eautomation.com/eMarketingPrograms/Server_SolutionDay/
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
The proliferation of connected devices and sensors is leading the Digital Transformation. By 2020 there will be over 20 billion connected devices. Data from these devices need to be ingested at extreme speeds in order to be analyzed before the data decays. The life cycle of the data is critical in revealing what insight can be revealed and how quickly they can be acted upon.
In this session we will look at the past, present and future architecture trends streaming analytics. We will look at how to turn all the data from devices into actionable insights and dive into recommendations for streaming architecture depending on the data streams and time factor of the data. We will also discuss how to manage all the sensor data, understand the life cycle cost of the data, and how to scale capacity and capability easily with a modern infrastructure strategy.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
Apache Spark is increasingly adopted as an alternate processing framework to MapReduce, due to its ability to speed up batch, interactive and streaming analytics. Spark enables new analytics use cases like machine learning and graph analysis with its rich and easy to use programming libraries. And, it offers the flexibility to run analytics on data stored in Hadoop, across data across object stores and within traditional databases. This makes Spark an ideal platform for accelerating cross-platform analytics on-premises and in the cloud. Building on the success of Spark 1.x release, Spark 2.x delivers major improvements in the areas of API, Performance, and Structured Streaming. In this paper, we will cover a high-level view of the Apache Spark framework, and then focus on what we consider to be very important improvements made in Apache Spark 2.x. We will then share the results of a real-world benchmark effort and share details on Spark and environment configuration changes made to our lab, discuss the results of the benchmark, and provide a reference architecture example for those interested in taking Spark 2.x for their own test drive. This presentation stresses the value of refreshing the Spark 1 with Spark 2 as performance testing results show 2.3x improvement with SparkSQL workloads similar to TPC Benchmark™ DS (TPC-DS). MARK LOCHBIHLER, Principal Architect, Hortonworks and VIPLAVA MADASU, Big Data Systems Engineer, Hewlett Packard Enterprise
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
4 C’s for Using Cloud to Support Scientific ResearchAvere Systems
While cost is a primary "c" driving the adoption of object-based cloud solutions in the life sciences, compute, capacity, and collaboration may all be bigger incentives. In this webinar, we'll examine how to use an Avere Hybrid Cloud NAS infrastructure to gain big benefits in areas like genomics research, personalized medicine, drug discovery, imaging, and other data analysis applications.
• Compute - Building production environments in the compute cloud without rewriting existing applications
• Capacity - Modernizing storage archives and disaster recovery by adding object storage for durability while leveraging existing on-premises NAS
• Collaboration - Using the cloud t o safely and securely share data globally
• Cost - Using cloud to lower overall costs to keep pace with fast-growing demands of research initiatives
Hadoop’s capabilities offer untapped potential for business insights but companies often get weighed down with DIY platforms and fail to keep up with the requirements. Join this Dell EMC session which will address this challenge with ready bundles to quickly deliver solutions for ETL offload, Single View, & IoT.
Get more value from your big data:
• Deploy big data applications faster
• Increase business agility
• Confidently deliver high performance and endless scale
• Improve IT operational efficiency
Speaker
Shawn Smith, Big Data Specialist, Dell EMC
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
Practice of large Hadoop cluster in China MobileDataWorks Summit
China Mobile Limited is the leading telecommunications services provider in China, with more than 800 million active users. In China Mobile, distributed big data clusters are built by branch companies in each province for their unique requirements. Meanwhile, we have built a centralized Hadoop cluster with scale more than 1600 nodes, on which we collect data from dozens of distributed clusters and make analysis for our business.
In this session, we will introduce the architecture of the centralized Hadoop cluster and experience of constructing and tuning this large scale Hadoop cluster. Key points are as follows:
1. About Ambari: we improve Ambari with features like supporting HDFS Federation and Ambari HA , improving its performance and enabling it to support up to 1600 nodes.
2. About HDFS: we build a large HDFS cluster with data up to 60PB, using federation, ViewFS, FairCallQueue. Our best practice of cluster operation and management will also be included.
3. About Flume: We use the reformed Flume to collect data as much as 200TB per day.
Speakers
Yuxuan Pan, Software Engineer, China Mobile Software Technology
Duan Yunfeng, Chief Designer of China Mobile's big data system, China Mobile Communications Corporation
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
The proliferation of connected devices and sensors is leading the Digital Transformation. By 2020 there will be over 20 billion connected devices. Data from these devices need to be ingested at extreme speeds in order to be analyzed before the data decays. The life cycle of the data is critical in revealing what insight can be revealed and how quickly they can be acted upon.
In this session we will look at the past, present and future architecture trends streaming analytics. We will look at how to turn all the data from devices into actionable insights and dive into recommendations for streaming architecture depending on the data streams and time factor of the data. We will also discuss how to manage all the sensor data, understand the life cycle cost of the data, and how to scale capacity and capability easily with a modern infrastructure strategy.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
Apache Spark is increasingly adopted as an alternate processing framework to MapReduce, due to its ability to speed up batch, interactive and streaming analytics. Spark enables new analytics use cases like machine learning and graph analysis with its rich and easy to use programming libraries. And, it offers the flexibility to run analytics on data stored in Hadoop, across data across object stores and within traditional databases. This makes Spark an ideal platform for accelerating cross-platform analytics on-premises and in the cloud. Building on the success of Spark 1.x release, Spark 2.x delivers major improvements in the areas of API, Performance, and Structured Streaming. In this paper, we will cover a high-level view of the Apache Spark framework, and then focus on what we consider to be very important improvements made in Apache Spark 2.x. We will then share the results of a real-world benchmark effort and share details on Spark and environment configuration changes made to our lab, discuss the results of the benchmark, and provide a reference architecture example for those interested in taking Spark 2.x for their own test drive. This presentation stresses the value of refreshing the Spark 1 with Spark 2 as performance testing results show 2.3x improvement with SparkSQL workloads similar to TPC Benchmark™ DS (TPC-DS). MARK LOCHBIHLER, Principal Architect, Hortonworks and VIPLAVA MADASU, Big Data Systems Engineer, Hewlett Packard Enterprise
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
4 C’s for Using Cloud to Support Scientific ResearchAvere Systems
While cost is a primary "c" driving the adoption of object-based cloud solutions in the life sciences, compute, capacity, and collaboration may all be bigger incentives. In this webinar, we'll examine how to use an Avere Hybrid Cloud NAS infrastructure to gain big benefits in areas like genomics research, personalized medicine, drug discovery, imaging, and other data analysis applications.
• Compute - Building production environments in the compute cloud without rewriting existing applications
• Capacity - Modernizing storage archives and disaster recovery by adding object storage for durability while leveraging existing on-premises NAS
• Collaboration - Using the cloud t o safely and securely share data globally
• Cost - Using cloud to lower overall costs to keep pace with fast-growing demands of research initiatives
Hadoop’s capabilities offer untapped potential for business insights but companies often get weighed down with DIY platforms and fail to keep up with the requirements. Join this Dell EMC session which will address this challenge with ready bundles to quickly deliver solutions for ETL offload, Single View, & IoT.
Get more value from your big data:
• Deploy big data applications faster
• Increase business agility
• Confidently deliver high performance and endless scale
• Improve IT operational efficiency
Speaker
Shawn Smith, Big Data Specialist, Dell EMC
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
Practice of large Hadoop cluster in China MobileDataWorks Summit
China Mobile Limited is the leading telecommunications services provider in China, with more than 800 million active users. In China Mobile, distributed big data clusters are built by branch companies in each province for their unique requirements. Meanwhile, we have built a centralized Hadoop cluster with scale more than 1600 nodes, on which we collect data from dozens of distributed clusters and make analysis for our business.
In this session, we will introduce the architecture of the centralized Hadoop cluster and experience of constructing and tuning this large scale Hadoop cluster. Key points are as follows:
1. About Ambari: we improve Ambari with features like supporting HDFS Federation and Ambari HA , improving its performance and enabling it to support up to 1600 nodes.
2. About HDFS: we build a large HDFS cluster with data up to 60PB, using federation, ViewFS, FairCallQueue. Our best practice of cluster operation and management will also be included.
3. About Flume: We use the reformed Flume to collect data as much as 200TB per day.
Speakers
Yuxuan Pan, Software Engineer, China Mobile Software Technology
Duan Yunfeng, Chief Designer of China Mobile's big data system, China Mobile Communications Corporation
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
This presentation will describe the analytics-to-cloud migration initiative underway at Fannie Mae. The goal of this effort is threefold: (1) build a sustainable process for data lake hydration on the cloud and (2) modernize the Fannie Mae enterprise data warehouse infrastructure and (3) retire Netezza.
Fannie Mae partnered with Impetus for modernization of its Netezza legacy analytics platform. This involved the use of the Impetus Workload Migration solution—a sophisticated translation engine that automated the migration of their complex Netezza stored procedures, shell and scheduler scripts to Apache Spark compatible scripts. This delivered substantial savings in time, effort and cost, while reducing overall project risk.
Included in the scope of the automation project was an automated assessment capability to perform detailed profiling of the current workloads. The output from the assessment stage was a data-driven offloading blueprint and roadmap for which workloads to migrate. A hybrid cloud-based big data solution was designed based on that. In addition to fulfilling the essential requirement of historical (and incremental) data migration and automated logic translation, the solution also recommends optimal storage formats for the data in the cloud, performing SCD Type 1 and Type 2 for mission-critical parameters and reloading the transformed data back for reporting/analytical consumption.
This will include the following topics:
i. Fannie Mae analytics overview
ii. Why cloud migration for analytics?
iii. Approach, major challenges, lessons learned
Speaker
Kevin Bates, Vice President for Enterprise Data Strategy Execution, Fannie Mae
How is it that one system can query terabytes of data, yet still provide interactive query support? This talk will discuss two of the underlying technologies that allow Apache Hive to support fast query response, both on-premise in HDFS and in cloud object stores such as S3 and WASB.
LLAP was introduced in Hive 2.6. It provides standing processes that securely cache Hive’s columnar data and can do query processing without ever needing to start tasks in Hadoop. We will cover LLAP’s architecture, intended uses cases, and performance numbers for both on-premise and in the cloud.
The second technology is the integration of Hive with Apache Druid. Druid excels at low-latency, interactive queries over streaming data. Its method of storing data makes it very well suited for OLAP style queries. We will cover how Hive can be integrated with Druid to support real-time streaming of data from Kafka and OLAP queries.
Speaker: Alan Gates, Co-Founder, Hortonworks
The Department of Home Affairs’ initial big data use-cases were standard Hadoop fare: legacy system archival, log data retention and a scalable BLOB store. Along the way we discovered that a platform as open as HDP offered opportunities beyond those exposed by everyone’s favourite asparagus diagram. Here you’ll see how we’ve augmented our Hadoop stack with Solr for text searching, offloading significant work from our Teradata Warehouse in the process; and how we are currently implementing JanusGraph, improving the quality of models developed by our Data Scientists and providing richer data to our Intelligence Analysts – all backed by HDP!
Speakers:
Steven O'Neill, Director EDW platforms, Data Warehouse, Australian Government, Department of Home Affairs
Dwane Hall, Hadoop Developer, Australian Government, Department of Home Affairs
Performance tuning your Hadoop/Spark clusters to use cloud storageDataWorks Summit
Remote storage provides the ability to separate compute and storage, which ushers in a new world of infinitely scalable and cost-effective storage. Remote storage in the cloud built to the HDFS standard has unique features that make it a great choice for storing and analyzing petabytes of data at a time. Customers can have unlimited storage capacity without any limit to the number or size of the files. With such scale, superior I/O performance becomes an increasingly important consideration when performing analysis on this data. For all workloads, a remote storage in the cloud can provide amazing performance when all the different knobs are tuned correctly...
Speaker
Stephen Wu, Senior Program Manager, Microsoft
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in realtime. On the other hand, Druid excels at low-latency, interactive queries over streaming data and making data available in realtime for queries. Although the high level messaging presented by both projects may lead you to believe they are competing for same use case, the technologies are in fact extremely complementary solutions.
By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build more powerful, flexible, and extremely low latency realtime streaming analytics solutions. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits, use cases, best practices and benchmark numbers.
The Agenda of the talk will be -
1. Motivation behind integrating Druid with Hive
2. Druid and Hive together - benefits
3. Use Cases with Demos and architecture discussion
4. Best Practices - Do's and Don'ts
5. Performance vs Cost Tradeoffs
6. SSB Benchmark Numbers
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
As one of the few closed-loop payment platforms, PayPal is uniquely positioned to provide merchants with insights aimed to identify opportunities to help grow and manage their business. PayPal processes billions of data events every day around our users, risk, payments, web behavior and identity. We are motivated to use this data to enable solutions to help our merchants maximize the number of successful transactions (checkout-conversion), better understand who their customers are and find additional opportunities to grow and attract new customers.
As part of the Merchant Data Analytics, we have built a platform that serves low latency, scalable analytics and insights by leveraging some of the established and emerging platforms to best realize returns on the many business objectives at PayPal.
Join us to learn more about how we leveraged platforms and technologies like Spark, Hive, Druid, Elastic Search and HBase to process large scale data for enabling impactful merchant solutions. We’ll share the architecture of our data pipelines, some real dashboards and the challenges involved.
Speakers
Kasiviswanathan Natarajan, Member of Technical Staff, PayPal
Deepika Khera, Senior Manager - Merchant Data Analytics, PayPal
This talk takes you on a rollercoaster ride through Hadoop 2 and explains the most significant changes and components.
The talk has been held on the JavaLand conference in Brühl, Germany on 25.03.2014.
Agenda:
- Welcome Office
- YARN Land
- HDFS 2 Land
- YARN App Land
- Enterprise Land
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
Hyperconverged systems offer a great deal of promise and yet come with a set of limitations.
While they allow enterprises to re-integrate system components into a single enclosure and reduce the physical complexity, floor space and cost of supporting a workload in the data center, they also often will not support existing storage in local SANs or offered by cloud service providers.
However, there are solutions available to address these challenges and allow hyperconverged systems to realize their promise. Sign up to discover:
• What are hyperconverged systems?
• What challenges do they pose?
• What should the ideal solution to those challenges look like?
• A solution that helps integrate hyperconverged systems with existing SANs
From Disaster to Recovery: Preparing Your IT for the UnexpectedDataCore Software
Did you know that 22% of data center outages are caused by human error? Or that 10% are caused by weather incidents?
The impact of an unexpected outage for just a few hours or even days could be catastrophic to your business.
How would you like to minimize or even eliminate these business interruptions, and more?
Join us to discover:
• Useful and simple measures to use that can help you keep the lights on
• How to quickly recover when the worst-case scenario occurs
• How to achieve zero downtime and high availability
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
This presentation will describe the analytics-to-cloud migration initiative underway at Fannie Mae. The goal of this effort is threefold: (1) build a sustainable process for data lake hydration on the cloud and (2) modernize the Fannie Mae enterprise data warehouse infrastructure and (3) retire Netezza.
Fannie Mae partnered with Impetus for modernization of its Netezza legacy analytics platform. This involved the use of the Impetus Workload Migration solution—a sophisticated translation engine that automated the migration of their complex Netezza stored procedures, shell and scheduler scripts to Apache Spark compatible scripts. This delivered substantial savings in time, effort and cost, while reducing overall project risk.
Included in the scope of the automation project was an automated assessment capability to perform detailed profiling of the current workloads. The output from the assessment stage was a data-driven offloading blueprint and roadmap for which workloads to migrate. A hybrid cloud-based big data solution was designed based on that. In addition to fulfilling the essential requirement of historical (and incremental) data migration and automated logic translation, the solution also recommends optimal storage formats for the data in the cloud, performing SCD Type 1 and Type 2 for mission-critical parameters and reloading the transformed data back for reporting/analytical consumption.
This will include the following topics:
i. Fannie Mae analytics overview
ii. Why cloud migration for analytics?
iii. Approach, major challenges, lessons learned
Speaker
Kevin Bates, Vice President for Enterprise Data Strategy Execution, Fannie Mae
How is it that one system can query terabytes of data, yet still provide interactive query support? This talk will discuss two of the underlying technologies that allow Apache Hive to support fast query response, both on-premise in HDFS and in cloud object stores such as S3 and WASB.
LLAP was introduced in Hive 2.6. It provides standing processes that securely cache Hive’s columnar data and can do query processing without ever needing to start tasks in Hadoop. We will cover LLAP’s architecture, intended uses cases, and performance numbers for both on-premise and in the cloud.
The second technology is the integration of Hive with Apache Druid. Druid excels at low-latency, interactive queries over streaming data. Its method of storing data makes it very well suited for OLAP style queries. We will cover how Hive can be integrated with Druid to support real-time streaming of data from Kafka and OLAP queries.
Speaker: Alan Gates, Co-Founder, Hortonworks
The Department of Home Affairs’ initial big data use-cases were standard Hadoop fare: legacy system archival, log data retention and a scalable BLOB store. Along the way we discovered that a platform as open as HDP offered opportunities beyond those exposed by everyone’s favourite asparagus diagram. Here you’ll see how we’ve augmented our Hadoop stack with Solr for text searching, offloading significant work from our Teradata Warehouse in the process; and how we are currently implementing JanusGraph, improving the quality of models developed by our Data Scientists and providing richer data to our Intelligence Analysts – all backed by HDP!
Speakers:
Steven O'Neill, Director EDW platforms, Data Warehouse, Australian Government, Department of Home Affairs
Dwane Hall, Hadoop Developer, Australian Government, Department of Home Affairs
Performance tuning your Hadoop/Spark clusters to use cloud storageDataWorks Summit
Remote storage provides the ability to separate compute and storage, which ushers in a new world of infinitely scalable and cost-effective storage. Remote storage in the cloud built to the HDFS standard has unique features that make it a great choice for storing and analyzing petabytes of data at a time. Customers can have unlimited storage capacity without any limit to the number or size of the files. With such scale, superior I/O performance becomes an increasingly important consideration when performing analysis on this data. For all workloads, a remote storage in the cloud can provide amazing performance when all the different knobs are tuned correctly...
Speaker
Stephen Wu, Senior Program Manager, Microsoft
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in realtime. On the other hand, Druid excels at low-latency, interactive queries over streaming data and making data available in realtime for queries. Although the high level messaging presented by both projects may lead you to believe they are competing for same use case, the technologies are in fact extremely complementary solutions.
By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build more powerful, flexible, and extremely low latency realtime streaming analytics solutions. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits, use cases, best practices and benchmark numbers.
The Agenda of the talk will be -
1. Motivation behind integrating Druid with Hive
2. Druid and Hive together - benefits
3. Use Cases with Demos and architecture discussion
4. Best Practices - Do's and Don'ts
5. Performance vs Cost Tradeoffs
6. SSB Benchmark Numbers
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase DataWorks Summit
As one of the few closed-loop payment platforms, PayPal is uniquely positioned to provide merchants with insights aimed to identify opportunities to help grow and manage their business. PayPal processes billions of data events every day around our users, risk, payments, web behavior and identity. We are motivated to use this data to enable solutions to help our merchants maximize the number of successful transactions (checkout-conversion), better understand who their customers are and find additional opportunities to grow and attract new customers.
As part of the Merchant Data Analytics, we have built a platform that serves low latency, scalable analytics and insights by leveraging some of the established and emerging platforms to best realize returns on the many business objectives at PayPal.
Join us to learn more about how we leveraged platforms and technologies like Spark, Hive, Druid, Elastic Search and HBase to process large scale data for enabling impactful merchant solutions. We’ll share the architecture of our data pipelines, some real dashboards and the challenges involved.
Speakers
Kasiviswanathan Natarajan, Member of Technical Staff, PayPal
Deepika Khera, Senior Manager - Merchant Data Analytics, PayPal
This talk takes you on a rollercoaster ride through Hadoop 2 and explains the most significant changes and components.
The talk has been held on the JavaLand conference in Brühl, Germany on 25.03.2014.
Agenda:
- Welcome Office
- YARN Land
- HDFS 2 Land
- YARN App Land
- Enterprise Land
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
Hyperconverged systems offer a great deal of promise and yet come with a set of limitations.
While they allow enterprises to re-integrate system components into a single enclosure and reduce the physical complexity, floor space and cost of supporting a workload in the data center, they also often will not support existing storage in local SANs or offered by cloud service providers.
However, there are solutions available to address these challenges and allow hyperconverged systems to realize their promise. Sign up to discover:
• What are hyperconverged systems?
• What challenges do they pose?
• What should the ideal solution to those challenges look like?
• A solution that helps integrate hyperconverged systems with existing SANs
From Disaster to Recovery: Preparing Your IT for the UnexpectedDataCore Software
Did you know that 22% of data center outages are caused by human error? Or that 10% are caused by weather incidents?
The impact of an unexpected outage for just a few hours or even days could be catastrophic to your business.
How would you like to minimize or even eliminate these business interruptions, and more?
Join us to discover:
• Useful and simple measures to use that can help you keep the lights on
• How to quickly recover when the worst-case scenario occurs
• How to achieve zero downtime and high availability
DataCore Software introduction from my "Meet DataCore" webinar. DataCore products include software-defined storage and hyperconverged infrastructure solutions. Datacore has more than 10K customers and 30K+ implementations world-wide.
Many companies have discovered that there is “gold” in their server log files and machine data. Closely monitoring this data can improve security, help prevent costly outages and reduce the time it takes to recover from a problem. In this presentation, GTRI’s Micah Montgomery explains how operational intelligence can be gained from machine data, and how Splunk Enterprise can turn this data into actionable insights. Also presenting was NetApp’s Steve Fritzinger, who discussed how to manage the challenges of capturing and storing a flood of data without breaking the bank.
Presented at "Denver Big Data Analytics Day" on May 18, 2016 at GTRI.
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...Amazon Web Services
"With geoseismic datasets that are petabytes in size and growing, finding tomorrow's energy is increasingly data and compute intensive. Hess Corporation, a global energy company, needed to be able to respond quickly to changing oil market demands, while minimizing costs. By migrating petabytes of data and running high performance computing (HPC) workloads on AWS, Hess reduced compute costs and accelerated time in which geologists received results.
In this session, you will learn how Hess built a GeoSeismic data repository on AWS, by leveraging S3 and EFS, and processes that data by building HPC clusters on-demand using the GPU-enabled P2 instance family. Additionally, you will learn how the Hess subsurface computing team was able to move from running on premise cap-ex driven GPU clusters to an op-ex driven on-demand model in the AWS cloud."
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around Cisco-based big data architectures and Hortonworks Data Platform to get you started on building your modern data architecture.
NVMe and all-flash systems can solve any performance, floor space and energy problems. At least this is the marketing message many vendors and analysts spread today – but actually, sounds too good to be true, right?
Like always in real life, there is no clear black or white, but some circumstances you should be aware of – especially if you intend to leverage these technologies.
You may ask yourself: Do I need to rip and replace my existing storage? What is the best way to integrate both? What benefits do I receive?
Well, just join our brief webinar, which also includes a live demo and audience Q&A so you can get the most out of these technologies, make your storage great again and discover:
• How to integrate Flash over NVMe in real life
• How to benefit of some Flash/NVMe for your entire applications
RapidScale DaaS offering allows companies of all sizes to move their desktops into the cloud eliminating PC maintenance, support, and life cycle management.
This overview provides insight into the ODA Engineered System. It outlines how the ODA is: Simple, Optimsed and Affordable to implement for all organisations.
Contact me to find out more:
E-mail:daryll.whyte@oracle.com
Phone: +441189244490
Twitter: @daryllwhyte
LinkedIn: https://ie.linkedin.com/in/daryllwhyte
Website- Oracle ODA: https://www.oracle.com/oda
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...Aerospike, Inc.
Containers are great ephemeral vessels for your applications. But what about the data that drives your business? It must survive containers coming and going, maintain its availability and reliability, and grow when you need it.
Alvin Richards reviews a number of strategies to deal with persistent containers and discusses where the data can be stored and how to scale the persistent container layer. Alvin includes code samples and interactive demos showing the power of Docker Machine, Engine, Swarm, and Compose, before demonstrating how to combine them with multihost networking to build a reliable, scalable, and production-ready tier for the data needs of your organization.
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY
HPC DAY 2017 - http://www.hpcday.eu/
Altair's PBS Pro: Your Gateway to HPC Computing
Dr. Jochen Krebs | Director Enterprise Sales Central & Eastern Europe at Altaire
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
Creating An Enterprise Storage Strategy for Healthcare DataCore Software
Healthcare CIOs face many challenges including maintaining continuous operations around the clock, managing data growth, and delivering the application performance required to provide high quality professional healthcare services.
In this presentation, learn how Mission Community Hospital’s CIO leveraged DataCore’s software-defined storage platform to create their enterprise storage strategy and achieve the following results:
• Over 2 years of 24x7x365 operations without any data outages
• 87% reduction in the amount of time to provision storage capacity
• Up to 5x performance increase for their SQL, PACS, and Oracle database system
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
PRGX is the world's leading provider of accounts payable audit services and works with leading global retailers. As new forms of data started to flow into their organizations, standard RDBMS systems were not allowing them to scale. Now, by using Talend with Cloudera Enterprise, they are able to acheive a 9-10x performance benefit in processing data, reduce errors, and now provide more innovative products and services to end customers.
Watch this webinar to learn how PRGX worked with Cloudera and Talend to create a high-performance computing platform for data analytics and discovery that rapidly allows them to process, model, and serve massive amount of structured and unstructured data.
How to you manage Performance in the Cloud, in particular in "Platform as a Service (PaaS) environments like Window's Azure or Heroku where you don't have a "virtual machine" to manage?
Even in "Infrastructure as a Service (IaaS)" environments like Amazon EC2 there are limitations on the tools you can deploy into that environment to assist in performance management, troubleshooting etc (e.g. you can't deploy promiscuous mode network sniffing tools in EC2).
James Smith from Adactus will give us an overview of Cloud Services as a whole, and then drill down into some of the issues they have experienced in deployed their "Pulse" Claims Management Solution into the Azure cloud (http://www.pulseclaims.com/home).
Beyond just looking at page speed performance he'll talk about the challenges involved in managing SLA's, Cloud "support" (or lack of it!), performance troubleshooting and the whole "performance lifecycle".
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
This introductory webinar, presented by Adi Krishnan, Senior Product Manager for Amazon Kinesis, will provide you with an overview of the service, sample use cases, and some examples of customer experiences with the service so you can better understand its capabilities and see how it might be integrated into your own applications.
Similar to DataCore Case Study on Hyperconverged (20)
Nvidia's 10/26/17 Advantech Solution Day presentation by Adam Scraba. Showcases examples of how Nvidia powered AI solutions increase public safety through video analysis and object recognition to significantly speed surveillance camera footage review and search.
Full details and tech talk video: http://www.advantech-eautomation.com/eMarketingPrograms/Server_SolutionDay/
How Intel Xeon CPUs enable smarter IoT solutions across various power and industrial use case requirements. Intel's 10/26/17 Advantech Solution Day presentation by Stephen Chenoweth.
Event details: http://www.advantech-eautomation.com/eMarketingPrograms/Server_SolutionDay/
Future of server market. Advantech servers for industrial compute and storage applications. 10/26/17 Advantech Solution Day presentation by Frank Chiang, Industrial Server PM.
Event details: http://www.advantech-eautomation.com/eMarketingPrograms/Server_SolutionDay/
NVMe revolutionizes Cloud and Edge computing for Industry 4.0. 10/26/17 Advantech Solution Day presentation by Frank Chiang, Industrial Server PM.
Event details: http://www.advantech-eautomation.com/eMarketingPrograms/Server_SolutionDay/
Industrial HMI is constantly evolving, changing the world of automation. The evolution from tiny keyboards to large, multi touch panels for automation devices has been a whirlwind. This presentation highlights some of HMIs greatest features and developments.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/