This document discusses how Sears Holdings leveraged Hadoop to address several big data challenges. It provides 4 examples where migrating legacy workloads and processes to Hadoop enabled: [1] Calculating price elasticity on their entire data set weekly instead of quarterly on a subset; [2] Scaling a complex batch process 100x to handle billions of records; [3] Reducing a batch processing window from 3.5 to 1.5 hours; [4] Enhancing user experience by enabling direct querying of data in Hadoop instead of legacy systems. The document emphasizes that Hadoop must be part of an overall data strategy and ecosystem to fully realize benefits for the enterprise.
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy ModernizationDataWorks Summit
In spite of recent advances in computing, many core business processes are batch-oriented running on Mainframes. Annual Mainframe costs are counted in 6+ figure Dollars per year, potentially growing with capacity needs. In order to tackle the cost challenge, many organizations have considered or attempted multi-year mainframe migration/re-hosting strategies. Traditional approaches to Mainframe elimination call for large initial investments and carry significant risks – It is hard to match Mainframe performance and reliability. Using Hadoop, Sears/MetaScale developed an innovative alternative that enables batch processing migration to Hadoop, without the risks, time and costs of other methods. This solution has been adopted in multiple businesses with excellent results and associated cost savings, as Mainframes are physically eliminated or downsized: Millions of dollars in savings based on MIP reductions have been seen – A reduction of 200 MIPS can yield $1 million in annual savings. MetaScale eliminated over 900 MIPs and an entire Mainframe system for one fortune 500 client. This presentation illustrates reference architecture and approach successfully used by MetaScale to move mainframe processing to the Hadoop platform without altering user-facing business applications.
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...Precisely
In this presentation from Syncsort and Cloudera, you'll learn how to bridge the technical, skill and cost gaps between mainframe and Hadoop. We discuss the top challenges of ingesting and processing mainframe data in Hadoop – and how to solve them.
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...MapR Technologies
Atzmon Hen-Tov & Lior Schachter, Pontis
Businesses everywhere are increasingly challenged by their dependencies on legacy platforms. The dramatic increase in data volume, speed, and types of data is quickly outstripping the capabilities of these legacy systems. By transitioning from a legacy RDBMS to a Hadoop-based platform, Pontis was able to process and analyze billions of mobile subscriber events every day. In this talk, we’ll provide a quick overview of our legacy system, as well as our process for migrating to our target architecture. We’ll continue with a review our Hadoop platform selection process, which involved a thorough RFP and a detailed analysis of the top Hadoop platform vendors. This session will focus on how we gradually transitioned to our big data platform over the course of several product versions, resulting in higher scalability and a lower TCO in each version. We’ll outline the benefits of the target architecture, and detail how we successfully integrated Hadoop into our organization. Our session will conclude with a look at technical solutions for dealing with big data deficiencies.
Klaus Gottschalk from IBM presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
"Last year IBM together with partners out of the OpenPOWER foundation won two of the multi-year contacts of the US CORAL program. Within these contacts IBM develops an ac- celerated HPC infrastructure and software development ecosystem that will be a major step towards Exascale Computing. We believe that the CORAL roadmap will enable a massive pull for transformation of HPC codes for accelerated systems. The talk will discuss the IBM HPC strategy, explain the OpenPOWER foundation and the show IBM OpenPOWER roadmap for CORAL and beyond."
Watch the video presentation: http://wp.me/p3RLHQ-f9x
Learn more: http://e.huawei.com/us/solutions/business-needs/data-center/high-performance-computing
See more talks from the Switzerland HPC Conference:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
We cover the IBM solution for HPC. In addition to hardware and software stack we show how the rational choice of compilation/running parameters helps to significantly improve the performance of technical computing applications.
Consolidate your SAP System landscape Teched && d-code 2014Goetz Lessmann
My slide deck from this year's SAP Teched && d-code on how to consolidate SAP system landscapes - both for SAP ERP and SAP BW (and actually any other SAP driven systems). The focus is on getting rid of some misconceptions about consolidations and focusing on solutions instead of problems to achieve tangible goals: TCO savings, quick wins, and a clear way of going for a one-SAP landscape.
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy ModernizationDataWorks Summit
In spite of recent advances in computing, many core business processes are batch-oriented running on Mainframes. Annual Mainframe costs are counted in 6+ figure Dollars per year, potentially growing with capacity needs. In order to tackle the cost challenge, many organizations have considered or attempted multi-year mainframe migration/re-hosting strategies. Traditional approaches to Mainframe elimination call for large initial investments and carry significant risks – It is hard to match Mainframe performance and reliability. Using Hadoop, Sears/MetaScale developed an innovative alternative that enables batch processing migration to Hadoop, without the risks, time and costs of other methods. This solution has been adopted in multiple businesses with excellent results and associated cost savings, as Mainframes are physically eliminated or downsized: Millions of dollars in savings based on MIP reductions have been seen – A reduction of 200 MIPS can yield $1 million in annual savings. MetaScale eliminated over 900 MIPs and an entire Mainframe system for one fortune 500 client. This presentation illustrates reference architecture and approach successfully used by MetaScale to move mainframe processing to the Hadoop platform without altering user-facing business applications.
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...Precisely
In this presentation from Syncsort and Cloudera, you'll learn how to bridge the technical, skill and cost gaps between mainframe and Hadoop. We discuss the top challenges of ingesting and processing mainframe data in Hadoop – and how to solve them.
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...MapR Technologies
Atzmon Hen-Tov & Lior Schachter, Pontis
Businesses everywhere are increasingly challenged by their dependencies on legacy platforms. The dramatic increase in data volume, speed, and types of data is quickly outstripping the capabilities of these legacy systems. By transitioning from a legacy RDBMS to a Hadoop-based platform, Pontis was able to process and analyze billions of mobile subscriber events every day. In this talk, we’ll provide a quick overview of our legacy system, as well as our process for migrating to our target architecture. We’ll continue with a review our Hadoop platform selection process, which involved a thorough RFP and a detailed analysis of the top Hadoop platform vendors. This session will focus on how we gradually transitioned to our big data platform over the course of several product versions, resulting in higher scalability and a lower TCO in each version. We’ll outline the benefits of the target architecture, and detail how we successfully integrated Hadoop into our organization. Our session will conclude with a look at technical solutions for dealing with big data deficiencies.
Klaus Gottschalk from IBM presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
"Last year IBM together with partners out of the OpenPOWER foundation won two of the multi-year contacts of the US CORAL program. Within these contacts IBM develops an ac- celerated HPC infrastructure and software development ecosystem that will be a major step towards Exascale Computing. We believe that the CORAL roadmap will enable a massive pull for transformation of HPC codes for accelerated systems. The talk will discuss the IBM HPC strategy, explain the OpenPOWER foundation and the show IBM OpenPOWER roadmap for CORAL and beyond."
Watch the video presentation: http://wp.me/p3RLHQ-f9x
Learn more: http://e.huawei.com/us/solutions/business-needs/data-center/high-performance-computing
See more talks from the Switzerland HPC Conference:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
We cover the IBM solution for HPC. In addition to hardware and software stack we show how the rational choice of compilation/running parameters helps to significantly improve the performance of technical computing applications.
Consolidate your SAP System landscape Teched && d-code 2014Goetz Lessmann
My slide deck from this year's SAP Teched && d-code on how to consolidate SAP system landscapes - both for SAP ERP and SAP BW (and actually any other SAP driven systems). The focus is on getting rid of some misconceptions about consolidations and focusing on solutions instead of problems to achieve tangible goals: TCO savings, quick wins, and a clear way of going for a one-SAP landscape.
Timely access to relevant information has always been critical to business success. Thousands and thousands of companies and institutions use SAP NetWeaver Business Warehouse (SAP NetWeaver BW) as the cornerstone for business intelligence in their SAP application landscapes. However, query performance has often been a challenge...
This technical paper explores how Voith IT Solutions has cut costs and improved system performance by migrating its SAP applications to IBM DB2 on the IBM Power platform. With support from SAP, IBM and IBM Premier Business Partner SVA, Voith has reduced total system administration effort, cut total data storage volumes through the use of IBM DB2 Deep Compression, and introduced Unicode for international working.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
At Clearwire we have a big data challenge: Processing millions of unique usage records comprising terabytes of data for millions of customers every week. Historically, massive purpose-built database solutions were used to process data, but weren?t particularly fast, nor did they lend themselves to analysis. As mobile data volumes increase exponentially, we needed a scalable solution that could process usage data for billing, provide a data analysis platform, and inexpensively store the data indefinitely. The solution? A Hadoop-based platform allowed us to architect and deploy an end-to-end solution based on a combination of physical data nodes and virtual edge nodes in less than six months. This solution allowed us to turn off our legacy usage processing solution and reduce processing times from hours to as little as 15-min. This improvement has enabled Clearwire to deliver actionable usage data to partners faster and more predictably than ever before. Usage processing was just the beginning; we?re now turning to the raw data stored in Hadoop, adding new data sources, and starting to analyze the data. Clearwire is now able to put multiple data sources in the hands of our analysts for further discovery and actionable intelligence.
Fully leveraging your data, infrastructure, and IT staff has never been more important than it is now, during these times of fiscal responsibility and evolving business demands. In response, businesses need to maximize their IT by getting increased performance, efficiency, and economics out of their infrastructure and resources.
This presentation focuses on three key technologies that provide particularly compelling opportunities to maximize IT:
-All-flash systems that accelerate access to information for faster decision-making, analysis and productivity.
-Unified storage solutions that enable you to process more, and diverse, workloads in less time while driving capacity efficiencies.
-Unified compute solutions that deliver improved orchestration and automation and enhance the productivity of your IT staff, while avoiding costly over- or under-provisioning.
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Hitachi Vantara
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bringing Flexibility, Agility and Readiness to the Real-Time Enterprise. VMworld 2015
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
Join Postgres experts Bruce Momjian and Marc Linster as they preview everything new in Postgres 12. You don’t want to miss this!
Highlights include:
- New compatibility features
- PostgreSQL: Table access methods
- Partitioning Improvements
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
As the core SQL processing engine of the Greenplum Unified Analytics Platform, the Greenplum Database delivers Industry leading performance for Big Data Analytics while scaling linearly on massively parallel processing clusters of standard x86 servers. This session reviews the product's underlying architecture, identify key differentiation areas, go deep into the new features introduced in Greenplum Database Release 4.2, and discuss our plans for 2012.
Analytikerne og it-leverandørerne snakker om convergence. Men hvad er det? Og giver det reelle fordele for it-afdelingerne, eller er det blot et nyt vendor lock-in?
Kom og hør HP’s bud på Converged Infrastructure i en åben verden – fra containerbaserede datacentre til hybride cloud-løsninger. Hvordan kan HP Converged Infrastructure hjælpe med at simplificere og automatisere it-infrastrukturen radikalt og frigive værdifulde ressourcer til forretningsorienterede initiativer?
The Most Trusted In-Memory database in the world- AltibaseAltibase
Life is a database. How you manage data defines business. ALTIBASE HDB with its Hybrid architecture combines the extreme speed of an In-Memory Database with the storage capacity of an On-Disk Database’ in a single unified engine.
ALTIBASE® HDB™ is the only Hybrid DBMS in the industry that combines an in-memory DBMS with an on-disk DBMS, with a single uniform interface, enabling real-time access to large volumes of data, while simplifying and revolutionizing data processing. ALTIBASE XDB is the world’s fastest in-memory DBMS, featuring unprecedented high performance, and supports SQL-99 standard for wide applicability.
Altibase is provider of In-Memory data solutions for real-time access, analysis and distribution of high volumes of data in mission-critical environments.
Please visit our website (www.altibase.com) to learn more about our products and read more about our case studies. Or contact us at info@altibase.com. We look forward to helping you!
The 3 T's - Using Hadoop to modernize with faster access to data and valueDataWorks Summit
Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL--the 3 T's of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.
Timely access to relevant information has always been critical to business success. Thousands and thousands of companies and institutions use SAP NetWeaver Business Warehouse (SAP NetWeaver BW) as the cornerstone for business intelligence in their SAP application landscapes. However, query performance has often been a challenge...
This technical paper explores how Voith IT Solutions has cut costs and improved system performance by migrating its SAP applications to IBM DB2 on the IBM Power platform. With support from SAP, IBM and IBM Premier Business Partner SVA, Voith has reduced total system administration effort, cut total data storage volumes through the use of IBM DB2 Deep Compression, and introduced Unicode for international working.
Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.
At Clearwire we have a big data challenge: Processing millions of unique usage records comprising terabytes of data for millions of customers every week. Historically, massive purpose-built database solutions were used to process data, but weren?t particularly fast, nor did they lend themselves to analysis. As mobile data volumes increase exponentially, we needed a scalable solution that could process usage data for billing, provide a data analysis platform, and inexpensively store the data indefinitely. The solution? A Hadoop-based platform allowed us to architect and deploy an end-to-end solution based on a combination of physical data nodes and virtual edge nodes in less than six months. This solution allowed us to turn off our legacy usage processing solution and reduce processing times from hours to as little as 15-min. This improvement has enabled Clearwire to deliver actionable usage data to partners faster and more predictably than ever before. Usage processing was just the beginning; we?re now turning to the raw data stored in Hadoop, adding new data sources, and starting to analyze the data. Clearwire is now able to put multiple data sources in the hands of our analysts for further discovery and actionable intelligence.
Fully leveraging your data, infrastructure, and IT staff has never been more important than it is now, during these times of fiscal responsibility and evolving business demands. In response, businesses need to maximize their IT by getting increased performance, efficiency, and economics out of their infrastructure and resources.
This presentation focuses on three key technologies that provide particularly compelling opportunities to maximize IT:
-All-flash systems that accelerate access to information for faster decision-making, analysis and productivity.
-Unified storage solutions that enable you to process more, and diverse, workloads in less time while driving capacity efficiencies.
-Unified compute solutions that deliver improved orchestration and automation and enhance the productivity of your IT staff, while avoiding costly over- or under-provisioning.
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Hitachi Vantara
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bringing Flexibility, Agility and Readiness to the Real-Time Enterprise. VMworld 2015
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
Join Postgres experts Bruce Momjian and Marc Linster as they preview everything new in Postgres 12. You don’t want to miss this!
Highlights include:
- New compatibility features
- PostgreSQL: Table access methods
- Partitioning Improvements
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
As the core SQL processing engine of the Greenplum Unified Analytics Platform, the Greenplum Database delivers Industry leading performance for Big Data Analytics while scaling linearly on massively parallel processing clusters of standard x86 servers. This session reviews the product's underlying architecture, identify key differentiation areas, go deep into the new features introduced in Greenplum Database Release 4.2, and discuss our plans for 2012.
Analytikerne og it-leverandørerne snakker om convergence. Men hvad er det? Og giver det reelle fordele for it-afdelingerne, eller er det blot et nyt vendor lock-in?
Kom og hør HP’s bud på Converged Infrastructure i en åben verden – fra containerbaserede datacentre til hybride cloud-løsninger. Hvordan kan HP Converged Infrastructure hjælpe med at simplificere og automatisere it-infrastrukturen radikalt og frigive værdifulde ressourcer til forretningsorienterede initiativer?
The Most Trusted In-Memory database in the world- AltibaseAltibase
Life is a database. How you manage data defines business. ALTIBASE HDB with its Hybrid architecture combines the extreme speed of an In-Memory Database with the storage capacity of an On-Disk Database’ in a single unified engine.
ALTIBASE® HDB™ is the only Hybrid DBMS in the industry that combines an in-memory DBMS with an on-disk DBMS, with a single uniform interface, enabling real-time access to large volumes of data, while simplifying and revolutionizing data processing. ALTIBASE XDB is the world’s fastest in-memory DBMS, featuring unprecedented high performance, and supports SQL-99 standard for wide applicability.
Altibase is provider of In-Memory data solutions for real-time access, analysis and distribution of high volumes of data in mission-critical environments.
Please visit our website (www.altibase.com) to learn more about our products and read more about our case studies. Or contact us at info@altibase.com. We look forward to helping you!
The 3 T's - Using Hadoop to modernize with faster access to data and valueDataWorks Summit
Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL--the 3 T's of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
Your Big Data strategy is only as good as the quality of your data. Today, deriving business value from data depends on how well your company can capture, cleanse, integrate and manage data. During this webinar, we discuss how to eliminate the challenges to Big Data management inside Hadoop.
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraDataWorks Summit
A new era of big data is coming, an era we would call ?Big Data 2.0,? with characteristics including: 1. The lines between data and metadata, storage and processing logic become further blurred 2. Data integration pattern is shifting from ETL (extract, transform and load) to the 3 T?s in Hadoop (transfer, transform and translate) 3. Batch-oriented data pipeline is challenged, even surpassed by stream-based data flow 4. In-memory big data processing emerges as a new promising trend 5. Latency from raw data to business intelligence is dramatically shortened toward real-time or near real-time 6. Hadoop and other No-SQL solutions are further integrated into the same environment 7. Mapping and conversion between relational/row-based and column-based data becomes end-user friendly 8. More ad hoc, interactive, query-based analytics outgrow pure MapReduce 9. Hadoop evolves from data server-centric to client rich 10. Hadoop becomes the centerpiece of enterprise data systems, with roles of database, data warehouse, and data center storage, all in one, as integrated platform and solutions This vision of Big Data 2.0 is based on Sears? research, development and production experience, and best practice in enterprise data solutions, which indicate that Hadoop is ready for its prime time in this new era.
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
At the StampedeCon 2013 Big Data conference in St. Louis, Justin Sheppard discussed Transforming Data Architecture Complexity at Sears. High ETL complexity and costs, data latency and redundancy, and batch window limits are just some of the IT challenges caused by traditional data warehouses. Gain an understanding of big data tools through the use cases and technology that enables Sears to solve the problems of the traditional enterprise data warehouse approach. Learn how Sears uses Hadoop as a data hub to minimize data architecture complexity – resulting in a reduction of time to insight by 30-70% – and discover “quick wins” such as mainframe MIPS reduction.
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course.
Companies that want to turn excellent customer experience into growth need to master Customer Journeys. Customer Journeys (the set of interactions a customer has with a brand to complete a task) and less moments of truth are what matter for a customer. Companies that master not only see an improvement in customer experience, loyalty, and operational productivity; they also see above-market growth.
In this slidecast, Richard Treadway and Rich Seger from NetApp discuss the company's storage solutions for Big Data and HPC. The company's HPC solutions for Lustre support massive performance and storage density without sacrificing efficiency.
Enterprise Integration of Disruptive TechnologiesDataWorks Summit
This talk will detail the HSBC Big Data journey to date walking through the genesis of the Big Data initiative which was triggered by continual challenges in delivering data driven products. The global scale, diversity and legacy of an organization like HSBC presents challenges for Hadoop adoption not typically faced by younger companies. Big Data technologies are by their very nature disruptive to the established Enterprise IT environment. Hadoop and the peripheral toolsets in the big data ecosystem do not fit comfortably into an Enterprise Data Centre, IT Operational processes and can even prove disruptive to current organization structures. Alasdair will focus on the steps that HSBC has taken to mitigate concerns about Hadoop and raise awareness of the game changing benefits a successful adoption of the technology will bring. HSBC have taken an innovative approach to proving out the value of the technology engaging developers with a brakes off opportunity to use the platform and by placing Hadoop in a competitive scenario with traditional technologies. The Hadoop journey in HSBC was initiated in Scotland, blessed in London and proved out in China.
Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
Hadoop as Data Refinery - Steve LoughranJAX London
Apache Hadoop is often described as a "Big Data Platform" but what does that mean? One way to better understand Hadoop is to talk about how Hadoop is used. This talk discusses using Hadoop as a "Data Refinery", which is a common use case. The concept is very much like a traditional oil refinery except with data, pulling in large quantities of "crude data" over pipelines, refining some into useful business intelligence; refining other pieces into slightly less crude data that stays in the cluster until needed later. This metaphor proves useful when considering how Hadoop could be adopted in an organisation that already has data warehousing and business intelligence systems -and when contemplating how to hook up a Hadoop cluster to the sources of data inside and outside that organisation. A key point to remember is that storing data in Hadoop is not a means to an end any more than storing data in a database is: it is extracting information from that data. Using Hadoop as a front end "data refinery" means that it can integrate with existing Business Intelligence systems, while providing the platform for new applications.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
In this one-hour webinar, Caserta Concepts and Talend described an approach to achieve an architectural framework and roadmap to extend a traditional enterprise data warehouse environment, into a Big Data ecosystem.
They illustrated the architectural components involved for collecting, analyzing and delivering Big Data, with a focus on the importance of Hadoop, Data Integration, Machine Learning, NoSQL, Business Intelligence and Analytics.
Attendees learned:
Which Big Data technologies can’t be ignored
Considerations when extending the data ecosystem
What happens to your existing investment
What are the points of integration
Does Big Data = better data?
To find access the recorded webinar or to learn more, visit http://www.casertaconcepts.com/.
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
Динамичное развитие инструментов для обработки Больших Данных порождает новые подходы к повышению производительности. Ключевые новые технологии в Hadoop 2.0, такие как Yarn labeling и Storage Tiering, уже используются компаниями Yahoo и Ebay. Эти новые технологии открывают путь для серьезного повышения эффективности ИТ-инфраструктуры для Hadoop, достигая прироста производительности в несколько десятков процентов при одновременном снижении потребления памяти и электроэнергии.
Эталонная архитектура для Hadoop от HP — HP Big Data Reference Architecture — предлагает использование специализированных "микросерверов" HP Moonshot вкупе с высокоплотными узлами хранения HP Apollo для достижения лучших на сегодня показателей полезной отдачи от железа в Hadoop.
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
Issue du salon orienté développeurs d'Intel (l'IDF) voici une présentation plutôt sympa sur le stockage dit "scale out" avec une présentation des différents fournisseurs de solutions (slide 6) comprenant ceux qui font du mode fichier, bloc et objet. Puis du benchmark sur certains d'entre eux dont Swift, Ceph et GlusterFS.
Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Data serves as the platform for decision-making at Uber. To facilitate data driven decisions, many datasets at Uber are ingested in a Hadoop Data Lake and exposed to querying via Hive. Analytical queries joining various datasets are run to better understand business data at Uber.
Data ingestion, at its most basic form, is about organizing data to balance efficient reading and writing of newer data. Data organization for efficient reading involves factoring in query patterns to partition data to ensure read amplification is low. Data organization for efficient writing involves factoring the nature of input data - whether it is append only or updatable.
At Uber we ingest terabytes of many critical tables such as trips that are updatable. These tables are fundamental part of Uber's data-driven solutions, and act as the source-of-truth for all the analytical use-cases across the entire company. Datasets such as trips constantly receive updates to the data apart from inserts. To ingest such datasets we need a critical component that is responsible for bookkeeping information of the data layout, and annotates each incoming change with the location in HDFS where this data should be written. This component is called as Global Indexing. Without this component, all records get treated as inserts and get re-written to HDFS instead of being updated. This leads to duplication of data, breaking data correctness and user queries. This component is key to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. This component will need to have strong consistency and provide large throughputs for index writes and reads.
At Uber, we have chosen HBase to be the backing store for the Global Indexing component and is a critical component in allowing us to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. In this talk, we will discuss data@Uber and expound more on why we built the global index using Apache Hbase and how this helps to scale out our cluster usage. We’ll give details on why we chose HBase over other storage systems, how and why we came up with a creative solution to automatically load Hfiles directly to the backend circumventing the normal write path when bootstrapping our ingestion tables to avoid QPS constraints, as well as other learnings we had bringing this system up in production at the scale of data that Uber encounters daily.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
3. Why Hadoop and Why Now?
THE ADVANTAGES:
Cost reduction
Alleviate performance bottlenecks
ETL too expensive and complex
Mainframe and Data Warehouse processing à Hadoop
THE CHALLENGE:
Traditional enterprises lack of awareness
THE SOLUTION:
Leverage the growing support system for Hadoop
Make Hadoop the data hub in the Enterprise
Use Hadoop for processing batch and analytic jobs
Page 3
4. The Classic Enterprise Challenge
Growing Data
Volumes
Shortened
Tight IT Processing
Budgets
Windows
Latency in The Escalating
Data Challenge Costs
Hitting
ETL Scalability
Complexity
Ceilings
Demanding
Business
Requirements
Page 4
5. The Sears Holdings Approach
Key to our Approach:
1) allowing users to continue to use familiar consumption interfaces
2) providing inherent HA
3) enabling businesses to unlock previously unusable data
1 2 3 4 5 6
Move results Retain, within
Implement a Move
Massively and Hadoop,
Hadoop- enterprise Make Hadoop
reduce ETL by aggregates source files at
centric batch the single
transforming back to legacy the finest
reference processing to point of truth
within Hadoop systems for granularity for
architecture Hadoop
consumption re-use
Page 5
6. The Architecture
• Enterprise solutions using Hadoop must be an
eco-system
• Large companies have a complex environment:
– Transactional system
– Services
– EDW and Data marts
– Reporting tools and needs
• We needed to build an entire solution
Page 6
8. The Learning
Over two years of Hadoop experience using Hadoop for Enterprise legacy workload.
ü We can dramatically reduce batch processing times for mainframe and EDW
HADOOP
ü We can retain and analyze data at a much more granular level, with longer history
ü Hadoop must be part of an overall solution and eco-system
IMPLEMENTATION
ü We can reliably meet our production deliverable time-windows by using Hadoop
ü We can largely eliminate the use of traditional ETL tools
ü New Tools allow improved user experience on very large data sets
UNIQUE VALUE
ü We developed tools and skills – The learning curve is not to be underestimated
ü We developed experience in moving workload from expensive, proprietary mainframe
and EDW platforms to Hadoop with spectacular results
Page 8
10. The Challenge – Use-Case #1
Sales:
Price
8.9B
Sync:
Line Elasticity:
Offers: Daily
Items 12.6B
1.4B
SKUs Parameters
Items: Stores:
Timing: 11.3M 3200
SKUs Inventory: Sites
Weekly
1.8B rows
• Intensive computational and large storage requirements
• Needed to calculate item price elasticity based on 8 billion rows of sales data
• Could only be run quarterly and on subset of data – Needed more often
• Business need - React to market conditions and new product launches
Page 10
11. The Result – Use-Case #1
Business Problem: Sales:
Price
8.9B
Sync:
Line Elasticity:
• Intensive computational Offers: Daily
Items 12.6B
and large storage 1.4B
SKUs Parameters
requirements
• Needed to calculate
Items: Stores:
store-item price 11.3M 3200
Timing:
elasticity based on 8 SKUs Inventory: Sites
Weekly
billion rows of sales 1.8B rows
data
• Could only be run
quarterly and on subset
of data
Hadoop
• Business missing the
opportunity to react to
changing market
conditions and new
product launches
Price elasticity New business 100% of data
calculated capability set and Meets all SLAs
weekly enabled granularity
Page 11
12. The Challenge – Use-Case #2
Mainframe
Data Scalability:
Sources: Unable to Mainframe:
30+ Scale 100 100 MIPS
Input fold on 1% of
Records:
data
Billions
Hadoop
• Mainframe batch business process would not scale
• Needed to process 100 times more detail to handle business critical functionality
• Business need required processing billions of records from 30 input data sources
• Complex business logic and financial calculations
• SLA for this cyclic process was 2 hours per run
Page 12
13. The Result – Use-Case #2
Mainframe
Business Problem: Data Scalability:
Unable to
Sources: Mainframe:
30+ Scale 100 100 MIPS
• Mainframe batch Input fold on 1% of
business process would Records:
data
not scale Billions
• Needed to process 100
times more detail to
handle rollout of high Hadoop
value business critical
functionality
• Time sensitive business
need required processing
billions of records from
30 input data sources
Teradata & Implemented JAVA UDFs for Scalable
Mainframe Data PIG for financial Solution in 8
• Complex business logic
on Hadoop Processing calculations Weeks
and financial calculations
• SLA for this cyclic
process was 2 hours per 6000 Lines
Processing Met $600K Annual
run Reduced to 400
Tighter SLA Savings
Lines of PIG
Page 13
14. The Challenge – Use-Case #3
Data
Storage:
Mainframe
DB2 Tables
Price
Processing
Data:
Window: Mainframe
500M
3.5 Hours Jobs: 64
Records
Hadoop
Mainframe unable to meet SLAs on growing data volume
Page 14
15. The Result – Use-Case #3
Business Problem:
Data
Storage:
Mainframe unable to meet Mainframe
DB2 Tables
SLAs on growing data volume
Price
Processing
Data:
Window: Mainframe
500M
3.5 Hours Jobs: 64
Records
Hadoop
Job Runs Over Maintenance
Source Data in 100% faster – $100K in Annual Improvement –
Hadoop Now in 1.5 Savings <50 Lines PIG
hours code
Page 15
16. The Challenge – Use-Case #4
Teradata via
Transformation:
Business
On Teradata User
Objects
Experience:
Unacceptable
Batch
History
Processing
Retained: New Report
Output: .CS
No Development:
V Files
Slow
Hadoop
• Needed to enhance user experience and ability to perform analytics at granular data
• Restricted availability of data due to space constraint
• Needed to retain granular data
• Needed Excel format interaction on data sources of 100 millions of records with agility
Page 16
17. The Result – Use-Case #4
Business Problem: Teradata via
Transformation:
Business
On Teradata User
Objects
Experience:
• Needed to enhance user Unacceptable
experience and ability to
Batch
perform analytics at Processing
History
granular data Retained: New Report
Output: .CS
No Development:
V Files
Slow
• Restricted availability of
data due to space
constraint
• Needed to retain granular
Hadoop
data
• Needed Excel format
interaction on data
sources of 100 millions of
records with agility User
Sourcing Data Redundant Transformation
Directly to Experience
Storage Moved to
Hadoop Expectations
Eliminated Hadoop
Met
Over 50 Data Business’s
Datameer for PIG Scripts to
Sources Granular History Single Source
Additional Ease Code
Retained in Retained of Truth
Analytics Maintenance
Hadoop
Page 17
18. Summary
• Hadoop can handle Enterprise workload
• Can reduce strain on legacy platforms
• Can reduce cost
• Can bring new business opportunities
• Must be an eco-system
• Must be part of an data overall strategy
• Not to be underestimated
Page 18
19. The Horizon – What do we need next
• Automation tools and techniques that ease the
Enterprise integration of Hadoop
• Educate traditional Enterprise IT organizations
about the possibilities and reasons to deploy
Hadoop
• Continue development of a reusable framework
for legacy workload migration
Page 19
20. For more information, visit:
www.metascale.com
Follow us on Twitter @BigDataMadeEasy
Join us on LinkedIn: www.linkedin.com/company/metascale-llc
Page 20