As one of the largest processors and controllers of global information, IBM has embarked on a global program towards GDPR compliance readiness. Using the same methodology, services, and solutions as it does with clients, this session will demonstrate how this process can serve as a model for GDPR for any large enterprise. How this model can then be a basis to help comply with all other regulatory needs and be a framework for future business transformation and opportunity. Specifics will include:
• A summary to the needs and opportunities of the GDPR regulation
• With the time left, where are you, what can still be done
• A prescriptive phased methodology of execution
• Core solution technical measures and capabilities
• Key GDPR actionable outcomes by stakeholder
The focus is on discovering, mapping, and managing personal data for GDPR, along with data protection and compliance, on Hadoop in a sustainable way.
Speaker
Richard Hogg, Global GDPR Evangelist, IBM
Adapting to the exponential development of technologyDataWorks Summit
Digitalization is impacting every industry in every economy, disrupting markets and upending competitive hierarchies. Across every business discipline, from operations and manufacturing to marketing and sales, companies are struggling to integrate new data, new analytics, and new technologies into their existing processes and practices. Companies that successfully adapt to these new and accelerating changes will outperform their competitors
The new challenges require new ways of organizing, new social and technical architectures. In this session, we identify the key challenges businesses face in the era of massive digitalization, and the organizational and architectural steps that will enable savvy competitors to find business value in this time of tumultuous technical change.
The gap between first steps and effective digitalization is large. For instance, businesses can run scrum teams without being agile; can have a data science lab without supporting autonomous model deployment to consumer-facing processes and applications; and can stand up a Hadoop ecosystem without understanding how the components fit together or what to do with it.
Based on our deep experience with dozens of clients, we describe key differences in levels of technology maturity, and practical tactics leading companies are using to turn digital innovation into competitive advantage.
Speaker
Santiago Cabrera-Naranjo, Consulting Director, Teradata
Risk listening: monitoring for profitable growthDataWorks Summit
Historically, insurers used 50-, 100-, and 500-year flood models for risk evaluation and pricing. The extreme weather events we have experienced in 2017 alone prove how dated these methods really are.
To better understand their customers and potential current/future liability claims, forward thinking insurers are monitoring, analyzing, and integrating external data sources in real time (weather feeds from USGS.gov, news and stock feeds, and satellite imagery, to name just a few). By integrating and injecting these new data sources into their risk models and underwriting, insurers are better able to identify their risk appetites and effectively price.
The session will include real-world case studies, including how a global P&C insurer is now quickly analyzing and monitoring 50,000 customers and targets, gaining new insights into the market. Another example is a global reinsurance and specialty company that now leverages digital news channels to monitor its risk portfolio for early warning claims indicators to help drive down loss costs.
Speaker
Cindy Maike, VP Industry Solutions, GM of Insurance, Hortonworks
Practical experiences using Atlas and Ranger to implement GDPRDataWorks Summit
GDPR is the current focus of all organizations working with data in Europe. Svenska Spel are using Atlas and Ranger as an essential part to make our
data warehouse to conform to GDPR.
This presentation will show how Atlas and Ranger, together with Hive are used in practice to control access and masking of data in our data warehouse. We will demonstrate how we with a model driven development process, using data vault modeling, have extended our model with metadata. From the metadata we generate our access control configuration. The configuration is automatically synchronized to Atlas and Ranger in our deployment process.
Speaker
Magnus Runesson, Data Engineer, Svenska Spel
Data Driven Development of Autonomous Driving at BMWDataWorks Summit
"The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry.
In this talk we will describe these challenges and how we, at BMW, are overcoming them by adapting and reinventing existing big data solutions for our end-to-end data journey for autonomous driving. Our journey involves ingesting data produced by a variety of sensors into a dedicated Hadoop cluster, decoding the data, conducting quality control, processing and storing the data on the clusters, making it searchable, analyzing it and exposing it to the engineers working on the algorithms development.
In the first part of the talk we will present a general overview of the challenges we faced and the lessons we learned from them. In the second part we will deep dive into the most interesting technical issues. These include: dealing with automotive formats and standards that are not designed for distributed processing; defragmentation of sensory data; assuring the quality of the data coming from complex car hardware and software components; efficient data search across petabytes of data; and reprocessing the computing components running in the car inside the data center, which typically requires high performance computing."
Speakers:
Felix Reuthlinger, Data Engineer for Autonomous Driving, BMW Group
Dogukan Sonmez, Senior Software Engineer, BMW Group
How data modelling helps serve billions of queries in millisecond latency wit...DataWorks Summit
Users of financial data require their queries to return results with very low latency. As a financial data service provider, Bloomberg needs to consistently meet these requirements for our clients.
HBase promises millisecond latency, auto-sharding, and an open schema. However, as HBase is a NoSQL database, support for transaction processing is not trivial. This talk will discuss a data modelling technique and use case where effective transaction processing can be achieved. We will also discuss how this data model helps us achieve real-time streaming, scalability, and millisecond read-write latency for billions of queries each day.
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...DataWorks Summit
The CDO bears responsibility for the firm’s data and information strategy, governance, control, policy development, and exploitation of data assets to create business value” Source – Gartner
In this session we will show how the CDO can manage and exploit all of a company’s data assets on Hadoop, in a controlled manner where the quality of the data is verified and security access is controlled, and all data activities are logged and recorded automatically on Atlas. We will demonstrate how to use Bluemetrix Data Manager (BDM) and show how easy it is to ingest, transform and control data on Hadoop, while automatically deploying governance on Atlas.
Speaker
Liam English, CEO, Bluemetrix
With data exploding in volume, variety, and velocity, and creating new challenges, it is critical for organizations to adopt a database solution that is performant, resilient, and capable of handling massive amounts of polymorphic data. At the same time, they must reduce the total cost of ownership (TCO) to free up budgets for business transformation programs. Modernization should employ an industrial methodology to drive the needed savings, but organizations should prepare for the risks and challenges of database modernization/migration at an industrial scale. Delivered at Postgres Vision 2018, this presentation by Gautam Khanna, Vice President and Global Head of Open Source Center of Excellence, Infosys, addresses the benefits of migrating to Postgres using its software plus services approach.
Adapting to the exponential development of technologyDataWorks Summit
Digitalization is impacting every industry in every economy, disrupting markets and upending competitive hierarchies. Across every business discipline, from operations and manufacturing to marketing and sales, companies are struggling to integrate new data, new analytics, and new technologies into their existing processes and practices. Companies that successfully adapt to these new and accelerating changes will outperform their competitors
The new challenges require new ways of organizing, new social and technical architectures. In this session, we identify the key challenges businesses face in the era of massive digitalization, and the organizational and architectural steps that will enable savvy competitors to find business value in this time of tumultuous technical change.
The gap between first steps and effective digitalization is large. For instance, businesses can run scrum teams without being agile; can have a data science lab without supporting autonomous model deployment to consumer-facing processes and applications; and can stand up a Hadoop ecosystem without understanding how the components fit together or what to do with it.
Based on our deep experience with dozens of clients, we describe key differences in levels of technology maturity, and practical tactics leading companies are using to turn digital innovation into competitive advantage.
Speaker
Santiago Cabrera-Naranjo, Consulting Director, Teradata
Risk listening: monitoring for profitable growthDataWorks Summit
Historically, insurers used 50-, 100-, and 500-year flood models for risk evaluation and pricing. The extreme weather events we have experienced in 2017 alone prove how dated these methods really are.
To better understand their customers and potential current/future liability claims, forward thinking insurers are monitoring, analyzing, and integrating external data sources in real time (weather feeds from USGS.gov, news and stock feeds, and satellite imagery, to name just a few). By integrating and injecting these new data sources into their risk models and underwriting, insurers are better able to identify their risk appetites and effectively price.
The session will include real-world case studies, including how a global P&C insurer is now quickly analyzing and monitoring 50,000 customers and targets, gaining new insights into the market. Another example is a global reinsurance and specialty company that now leverages digital news channels to monitor its risk portfolio for early warning claims indicators to help drive down loss costs.
Speaker
Cindy Maike, VP Industry Solutions, GM of Insurance, Hortonworks
Practical experiences using Atlas and Ranger to implement GDPRDataWorks Summit
GDPR is the current focus of all organizations working with data in Europe. Svenska Spel are using Atlas and Ranger as an essential part to make our
data warehouse to conform to GDPR.
This presentation will show how Atlas and Ranger, together with Hive are used in practice to control access and masking of data in our data warehouse. We will demonstrate how we with a model driven development process, using data vault modeling, have extended our model with metadata. From the metadata we generate our access control configuration. The configuration is automatically synchronized to Atlas and Ranger in our deployment process.
Speaker
Magnus Runesson, Data Engineer, Svenska Spel
Data Driven Development of Autonomous Driving at BMWDataWorks Summit
"The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry.
In this talk we will describe these challenges and how we, at BMW, are overcoming them by adapting and reinventing existing big data solutions for our end-to-end data journey for autonomous driving. Our journey involves ingesting data produced by a variety of sensors into a dedicated Hadoop cluster, decoding the data, conducting quality control, processing and storing the data on the clusters, making it searchable, analyzing it and exposing it to the engineers working on the algorithms development.
In the first part of the talk we will present a general overview of the challenges we faced and the lessons we learned from them. In the second part we will deep dive into the most interesting technical issues. These include: dealing with automotive formats and standards that are not designed for distributed processing; defragmentation of sensory data; assuring the quality of the data coming from complex car hardware and software components; efficient data search across petabytes of data; and reprocessing the computing components running in the car inside the data center, which typically requires high performance computing."
Speakers:
Felix Reuthlinger, Data Engineer for Autonomous Driving, BMW Group
Dogukan Sonmez, Senior Software Engineer, BMW Group
How data modelling helps serve billions of queries in millisecond latency wit...DataWorks Summit
Users of financial data require their queries to return results with very low latency. As a financial data service provider, Bloomberg needs to consistently meet these requirements for our clients.
HBase promises millisecond latency, auto-sharding, and an open schema. However, as HBase is a NoSQL database, support for transaction processing is not trivial. This talk will discuss a data modelling technique and use case where effective transaction processing can be achieved. We will also discuss how this data model helps us achieve real-time streaming, scalability, and millisecond read-write latency for billions of queries each day.
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...DataWorks Summit
The CDO bears responsibility for the firm’s data and information strategy, governance, control, policy development, and exploitation of data assets to create business value” Source – Gartner
In this session we will show how the CDO can manage and exploit all of a company’s data assets on Hadoop, in a controlled manner where the quality of the data is verified and security access is controlled, and all data activities are logged and recorded automatically on Atlas. We will demonstrate how to use Bluemetrix Data Manager (BDM) and show how easy it is to ingest, transform and control data on Hadoop, while automatically deploying governance on Atlas.
Speaker
Liam English, CEO, Bluemetrix
With data exploding in volume, variety, and velocity, and creating new challenges, it is critical for organizations to adopt a database solution that is performant, resilient, and capable of handling massive amounts of polymorphic data. At the same time, they must reduce the total cost of ownership (TCO) to free up budgets for business transformation programs. Modernization should employ an industrial methodology to drive the needed savings, but organizations should prepare for the risks and challenges of database modernization/migration at an industrial scale. Delivered at Postgres Vision 2018, this presentation by Gautam Khanna, Vice President and Global Head of Open Source Center of Excellence, Infosys, addresses the benefits of migrating to Postgres using its software plus services approach.
As open source databases become the enterprise standard, making all data available and accessible for AI has become an even bigger challenge. In the presentation delivered at Postgres Vision 2018, Rob Thomas, General Manager of IBM Analytics, provided answers for how companies can prepare their Information Architecture for AI, leverage containers and multi-cloud for innovation, and deliver a data and analytics strategy at scale.
Lufthansa Reference Architecture for the OpenGroupCapgemini
The presentation describes the aviation reference architecture framework including the meta model and a domain overview.
Author of paper and presentation:
Kai Schröder, Carsten Brockmann and Eldar Sultanow from
Capgemini Germany and
Carsten Breithaupt and Christian Vollmer from Lufthansa
Postgres Vision 2018: How to Consume your Database Platform On-premisesEDB
The usual model for a database platform on-premises is to run it the way IT is usually operated - silo'd and capital- and labor-intensive. In the cloud, consumption means that you pay for what you use, with less heavy lifting to operate the platform. Presented at Postgres Vision 2018, this covers how HPE can deliver EDB Postgres in the data center or on the edge in a consumption model that is pay-per-use, elastic IT, operated for you, migrated, and integrated.
The hybrid cloud computing market is analyzed based on four segments: solutions, service model, verticals and regions. The solutions segment includes application architecture, network integration and management systems. Application architecture segment is expected to have a major role in the hybrid cloud computing market.
According to Infoholic research, the “Worldwide Hybrid Cloud Computing Market” is expected to grow at a CAGR of 34.3% during the forecast period 2016–2022.
Hear how Manulife Asia has built an environment that enables the company to solve business-critical problems across many countries. What began in 2017 as an update to their enterprise architecture now spans everything from infrastructure to applications, powering their entire digital backbone. It includes fraud identification, real-time investment dashboards, advanced analytics and machine learning, and digital connection apps that talk to customers for claims, support, and more. Learn the importance hard work, coordination, discipline, and an agile methodology play in deciding which use cases they will focus on to deliver new services in an environment where everything is time sensitive and business requirements shift regularly.
Speaker: Ellen Wu, Head of Asia Data Office, Global Data Enablement and Governance, Manulife
At Postgres Vision 2018, Lauren Nelson, Principal Analyst, Forrester, provided a look into the practical considerations that are influencing modern cloud strategies, including existing skill sets and technology limitations, the assortment of current and future cloud workloads, and the economics and realities of today's technology options.
This presentation was delivered at the BI SIG in Palo Alto. It provides an overview of the market shift away from on-premise solutions to on-demand in the business intelligence industry.
EnterpriseDB CEO and President Ed Boyajian opened Postgres Vision 2018 with this presentation providing a look at enterprise activity in the cloud and how Postgres can extend across the IT infrastructure, from on-premises to the cloud.
With the explosive growth of IoT, the edge is predicted to grow to 25 billion connected devices by 2020. But, enterprises are still struggling to manage hundreds of devices that they have deployed. Not from a device management standpoint but more from a data management standpoint. Enterprises are unable to capture and process data directly from the edge devices for immediate analysis and gaining real-time actionable intelligence. So, if that is not possible, IoT initiatives are failing to become successful. How can an enterprise gather real-time data from edge devices? How can it change the behavior of such data collection processes? How can it ensure that data will be analyzed immediately? How can it understand the lineage of the data from edge to enterprise? How can it manage edge agents? What is an edge management hub? Attend this session to get a detailed understanding of key edge management challenges and how to address them with the correct solutions.
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Postgres Vision 2018: Making Modern an Old Legacy SystemEDB
A New England insurance company had aging hardware, a database that was out of support, an older operating system, rising costs, and no disaster recovery plan. Craig Bogovich of NTT Data tackled this massive website backend, used by the company's insureds, providers, and partners, and architected a complete overhaul and ultimately deployed it into the cloud. Presented at Postgres Vision 2018, this presentation shows how the project unfolded and provided the strategies and methods used to modernize this legacy system with open source software and cloud technology.
Postgres, the leading open source relational database, is positioned as the centerpiece of a pivot from traditional architectures to a micro-services based approach that is in full support of a DevOps motion.
Presented by Marc Linster, Senior Vice President of Product Development at EnterpriseDB, this explores how Postgres meets the key requirements for DevOps. Lister explains how Postgres is developer friendly, supporting the process with a versatile data model using JSONB, integrating other data sources using Foreign Data Wrappers, and how Postgres supports rapid deployment in the cloud and on premises.
Freddie Mac makes homeownership and rental housing more accessible and affordable. Operating in the secondary mortgage market, we keep mortgage capital flowing by purchasing mortgage loans from lenders so they in turn can provide more loans to qualified borrowers. Our mission to provide liquidity, stability, and affordability to the U.S. housing market in all economic conditions extends to all communities from coast to coast.
We're using big data and advanced analytics to create powerful enhancements to better meet our customer’s needs: automated collateral evaluation, automated assessments for borrowers without credit scores, immediate certainty for collateral rep and warranty relief, and coming soon automated asset and income validation.
We’re building tools to help our customers cut costs and give them rep and warranty relief sooner in the loan manufacturing process.
We’ve designed Loan Advisor Suite with lenders to give our customers greater certainty, usability, reliability and efficiency. It's a simpler, better way to do business.
More Tools - Access powerful solutions for every stage of the loan production process.
More Loans - Increase output with automated data management and user-friendly controls.
Less Risk = Get alerted to loan issues and take action the moment they occur.
Hear the story of how ACE helped Freddie Mac reimagine the mortgage process and how HDP helped make it possible.
Speaker
Dennis Tally, Freddie Mac, Director
Watch here: https://bit.ly/2D1fqB6
Today’s evolving data landscape has spawned new business challenges that require innovative solutions. These challenges include:
- Strategic decision-making, which relies on multiple perspectives such as social and economic factors that require combining internal and external data.
- Accounting for the increased volume and structural complexity of today’s data, and increased frequency required in delivering data assets.
- Coping with data silos that house data that must be combined and provisioned to support decision-making.
- Exposing purpose-built analytics, such as supply chain, for consumption in order to expedite decision-making.
Attend this session to learn how Data as a Service, fueled by data virtualization, overcomes these common challenges from the three dimensions of:
- Provisioning information-rich external data assets,
- Connecting data silos, and
- Enabling pre-built and packaged analytics.
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...Mariano Gonzalez
Everybody wants to do big data on a data lake! However, implementing it and maintaining the infrastructure necessary to explore it, such as Spark, has been a historically challenging endeavor. Kubernetes is the tool of choice for cloud orchestration, and Spark continues to be the de facto framework for most data wrangling tasks. We’ve previously tried different data lake architectures, and suffered from the pain that Hadoop carries with it. Finally, we decided to bring the best from the cloud and big data worlds together, and walk you through a session on how to set an endless data lake powered with native Spark executors on Kubernetes
Defining a Digitalization Reference Architecture for the Pharma IndustryCapgemini
The rapid pace of technological advancement and an increasingly volatile business environment causes pharma industry players to face three main challenges. The first
challenge is to develop the critical ability to create and maintain a common view on the big picture of the organization’s architecture and environment. Linked to this is the challenge of making it decisible, which actions must be taken on strategical, business and technological level. The last challenge consists in identifying/considering the best
architectural approach and solution on a fine-grained level. This may include answer on question such as “Which approach we choose to implement a read replica in the cloud
(transactional replication using publish-subscribe between relational database systems, stream into a Redis Cache, …)?”
We develop a digitalization RA for pharma industry and incorporate three main areas, Internet of Things (IoT), Cognitive Computing (CC), and Augmented Reality (AR).
Digitalization is horizontally divisible into four sequentially arranged domains that can be understood as the stages of digitalization: Tag, Sense and Wire (1), Ingest (2), Analyze and Prepare (3), and Utilize (4). Technology components may be used across different industries. Applications developed and running in different industries may both interact or be integrated with each other and exchange data through these
industry-specific applications (cool chain monitoring while transportation, healthcare research data for public, origination & distribution information for authorities). For this reason, our frame possesses a third dimension: connected industries (in pharma these include transportation & logistics and the public sector). This approach generates a big picture that transparently illustrates the pharma RA for digitalization including its stages, architecture layers and connections between building blocks across organizational boundaries and different industries.
Author of paper and presentation: Eldar Sultanow, Carsten Brockmann, Levent Sözer, Capgemini Germany
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS EDB
Niels Zegveld, Manager, Engineering Database and Middleware of Rabobank, presented a case study at Postgres Vision 2018 that explained building a new Database-as-a-Service (DBaaS) with EDB Postgres so that IT managers would no longer have to interact with the OS.
As open source databases become the enterprise standard, making all data available and accessible for AI has become an even bigger challenge. In the presentation delivered at Postgres Vision 2018, Rob Thomas, General Manager of IBM Analytics, provided answers for how companies can prepare their Information Architecture for AI, leverage containers and multi-cloud for innovation, and deliver a data and analytics strategy at scale.
Lufthansa Reference Architecture for the OpenGroupCapgemini
The presentation describes the aviation reference architecture framework including the meta model and a domain overview.
Author of paper and presentation:
Kai Schröder, Carsten Brockmann and Eldar Sultanow from
Capgemini Germany and
Carsten Breithaupt and Christian Vollmer from Lufthansa
Postgres Vision 2018: How to Consume your Database Platform On-premisesEDB
The usual model for a database platform on-premises is to run it the way IT is usually operated - silo'd and capital- and labor-intensive. In the cloud, consumption means that you pay for what you use, with less heavy lifting to operate the platform. Presented at Postgres Vision 2018, this covers how HPE can deliver EDB Postgres in the data center or on the edge in a consumption model that is pay-per-use, elastic IT, operated for you, migrated, and integrated.
The hybrid cloud computing market is analyzed based on four segments: solutions, service model, verticals and regions. The solutions segment includes application architecture, network integration and management systems. Application architecture segment is expected to have a major role in the hybrid cloud computing market.
According to Infoholic research, the “Worldwide Hybrid Cloud Computing Market” is expected to grow at a CAGR of 34.3% during the forecast period 2016–2022.
Hear how Manulife Asia has built an environment that enables the company to solve business-critical problems across many countries. What began in 2017 as an update to their enterprise architecture now spans everything from infrastructure to applications, powering their entire digital backbone. It includes fraud identification, real-time investment dashboards, advanced analytics and machine learning, and digital connection apps that talk to customers for claims, support, and more. Learn the importance hard work, coordination, discipline, and an agile methodology play in deciding which use cases they will focus on to deliver new services in an environment where everything is time sensitive and business requirements shift regularly.
Speaker: Ellen Wu, Head of Asia Data Office, Global Data Enablement and Governance, Manulife
At Postgres Vision 2018, Lauren Nelson, Principal Analyst, Forrester, provided a look into the practical considerations that are influencing modern cloud strategies, including existing skill sets and technology limitations, the assortment of current and future cloud workloads, and the economics and realities of today's technology options.
This presentation was delivered at the BI SIG in Palo Alto. It provides an overview of the market shift away from on-premise solutions to on-demand in the business intelligence industry.
EnterpriseDB CEO and President Ed Boyajian opened Postgres Vision 2018 with this presentation providing a look at enterprise activity in the cloud and how Postgres can extend across the IT infrastructure, from on-premises to the cloud.
With the explosive growth of IoT, the edge is predicted to grow to 25 billion connected devices by 2020. But, enterprises are still struggling to manage hundreds of devices that they have deployed. Not from a device management standpoint but more from a data management standpoint. Enterprises are unable to capture and process data directly from the edge devices for immediate analysis and gaining real-time actionable intelligence. So, if that is not possible, IoT initiatives are failing to become successful. How can an enterprise gather real-time data from edge devices? How can it change the behavior of such data collection processes? How can it ensure that data will be analyzed immediately? How can it understand the lineage of the data from edge to enterprise? How can it manage edge agents? What is an edge management hub? Attend this session to get a detailed understanding of key edge management challenges and how to address them with the correct solutions.
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Postgres Vision 2018: Making Modern an Old Legacy SystemEDB
A New England insurance company had aging hardware, a database that was out of support, an older operating system, rising costs, and no disaster recovery plan. Craig Bogovich of NTT Data tackled this massive website backend, used by the company's insureds, providers, and partners, and architected a complete overhaul and ultimately deployed it into the cloud. Presented at Postgres Vision 2018, this presentation shows how the project unfolded and provided the strategies and methods used to modernize this legacy system with open source software and cloud technology.
Postgres, the leading open source relational database, is positioned as the centerpiece of a pivot from traditional architectures to a micro-services based approach that is in full support of a DevOps motion.
Presented by Marc Linster, Senior Vice President of Product Development at EnterpriseDB, this explores how Postgres meets the key requirements for DevOps. Lister explains how Postgres is developer friendly, supporting the process with a versatile data model using JSONB, integrating other data sources using Foreign Data Wrappers, and how Postgres supports rapid deployment in the cloud and on premises.
Freddie Mac makes homeownership and rental housing more accessible and affordable. Operating in the secondary mortgage market, we keep mortgage capital flowing by purchasing mortgage loans from lenders so they in turn can provide more loans to qualified borrowers. Our mission to provide liquidity, stability, and affordability to the U.S. housing market in all economic conditions extends to all communities from coast to coast.
We're using big data and advanced analytics to create powerful enhancements to better meet our customer’s needs: automated collateral evaluation, automated assessments for borrowers without credit scores, immediate certainty for collateral rep and warranty relief, and coming soon automated asset and income validation.
We’re building tools to help our customers cut costs and give them rep and warranty relief sooner in the loan manufacturing process.
We’ve designed Loan Advisor Suite with lenders to give our customers greater certainty, usability, reliability and efficiency. It's a simpler, better way to do business.
More Tools - Access powerful solutions for every stage of the loan production process.
More Loans - Increase output with automated data management and user-friendly controls.
Less Risk = Get alerted to loan issues and take action the moment they occur.
Hear the story of how ACE helped Freddie Mac reimagine the mortgage process and how HDP helped make it possible.
Speaker
Dennis Tally, Freddie Mac, Director
Watch here: https://bit.ly/2D1fqB6
Today’s evolving data landscape has spawned new business challenges that require innovative solutions. These challenges include:
- Strategic decision-making, which relies on multiple perspectives such as social and economic factors that require combining internal and external data.
- Accounting for the increased volume and structural complexity of today’s data, and increased frequency required in delivering data assets.
- Coping with data silos that house data that must be combined and provisioned to support decision-making.
- Exposing purpose-built analytics, such as supply chain, for consumption in order to expedite decision-making.
Attend this session to learn how Data as a Service, fueled by data virtualization, overcomes these common challenges from the three dimensions of:
- Provisioning information-rich external data assets,
- Connecting data silos, and
- Enabling pre-built and packaged analytics.
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...Mariano Gonzalez
Everybody wants to do big data on a data lake! However, implementing it and maintaining the infrastructure necessary to explore it, such as Spark, has been a historically challenging endeavor. Kubernetes is the tool of choice for cloud orchestration, and Spark continues to be the de facto framework for most data wrangling tasks. We’ve previously tried different data lake architectures, and suffered from the pain that Hadoop carries with it. Finally, we decided to bring the best from the cloud and big data worlds together, and walk you through a session on how to set an endless data lake powered with native Spark executors on Kubernetes
Defining a Digitalization Reference Architecture for the Pharma IndustryCapgemini
The rapid pace of technological advancement and an increasingly volatile business environment causes pharma industry players to face three main challenges. The first
challenge is to develop the critical ability to create and maintain a common view on the big picture of the organization’s architecture and environment. Linked to this is the challenge of making it decisible, which actions must be taken on strategical, business and technological level. The last challenge consists in identifying/considering the best
architectural approach and solution on a fine-grained level. This may include answer on question such as “Which approach we choose to implement a read replica in the cloud
(transactional replication using publish-subscribe between relational database systems, stream into a Redis Cache, …)?”
We develop a digitalization RA for pharma industry and incorporate three main areas, Internet of Things (IoT), Cognitive Computing (CC), and Augmented Reality (AR).
Digitalization is horizontally divisible into four sequentially arranged domains that can be understood as the stages of digitalization: Tag, Sense and Wire (1), Ingest (2), Analyze and Prepare (3), and Utilize (4). Technology components may be used across different industries. Applications developed and running in different industries may both interact or be integrated with each other and exchange data through these
industry-specific applications (cool chain monitoring while transportation, healthcare research data for public, origination & distribution information for authorities). For this reason, our frame possesses a third dimension: connected industries (in pharma these include transportation & logistics and the public sector). This approach generates a big picture that transparently illustrates the pharma RA for digitalization including its stages, architecture layers and connections between building blocks across organizational boundaries and different industries.
Author of paper and presentation: Eldar Sultanow, Carsten Brockmann, Levent Sözer, Capgemini Germany
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS EDB
Niels Zegveld, Manager, Engineering Database and Middleware of Rabobank, presented a case study at Postgres Vision 2018 that explained building a new Database-as-a-Service (DBaaS) with EDB Postgres so that IT managers would no longer have to interact with the OS.
In this Accenture document we explore the implications, challenges and impacts of the General Data Protection Regulation (GDPR) as well as touching on the opportunities this regulation creates for financial services firms. Learn more: https://accntu.re/2uq8ANV
The GDPR Most Wanted: The Marketer and Analyst's Role in ComplianceObservePoint
This eBook outlines the role marketers and analysts play in helping their companies:
- Govern all existing web and app technologies
- Collect, store and analyze data properly
- Ensure ethical marketing and analytics practices
Date: 15th November 2017
Location: AI Lab Theatre
Time: 16:30 - 17:00
Speaker: Elisabeth Olafsdottir / Santiago Castro
Organisation: Microsoft / Keyrus
GDPR: 20 Million Reasons to get ready - Part 1: Preparing for complianceCloudera, Inc.
The first webinar of the series starts at the beginning: preparing for GDPR compliance. In this session, we look at how technology and process come together to let organisations get to grips with the GDPR relevant data that flows around their companies and work towards compliance. We will give you practical examples on how to apply data discovery, data minimisation, data protection and security as well as the role of the record of processing in this.
This Webinar featuring guests from the EU Commission, the French data regulator CNIL, DLA Piper and IBM provided an overview of the new EU data protection and privacy perspective from the perspective of the regulation author, regulator, legal advisor and technology providers.
Explain your algorithmic decisions for gdprPierre Feillet
What are the challenges of GDPR coming in 2018? We share an overview of the regulation, and zoom on its algorithmic aspects. We present best practices in decision automation to place symbolic AI in complement of ML, and then introduce eXplainable AI.
This edition of The CEO Views brings to you “Top 10 GDPR Solution Providers 2020”. The list highlights some of the GDPR solution providers who offer the best in class in the technology landscape. The proposed list aspires to assist individuals and organizations to find the best companies that will help them accomplish their projects.
governance, management & compliance
Today, not only large companies, but also SMEs understand the economic and strategic value of big data, collected from customers, potential customers, partners, etc However, it is not only a question of collecting data, but also of analyzing and managing them, ensuring their quality, according to principles of accuracy, reliability and minimization.
Infact, it is necessary to have the possibility to process such huge data legitimately and profitably, mapping the activities of data collection, management and storage, with analysis programs in compliance with the regulations in force (GDPR first of all).
Corporate Governance today, therefore, passes from Data Governance and Data Analytics and the ability to organize the company in order to exploit the power of data.
Impact of GDPR on Third Party and M&A SecurityEQS Group
GDPR impact has been dissected and examined to death - however, M&A activities, as well as third-party security posture, can be greatly affected as well, and this aspect has not been very often pursued. This session hopes to be useful for that.
BigID gives companies to automate core privacy and GDPR functions like data inventory, right-to-be-forgotten, consent, breach notification and Article 30 record keeling at scale.
Preparing for GDPR: What Every B2B Marketer Must KnowIntegrate
Considering the consequences of non-compliance (up to €20M/$24M or 4% worldwide annual revenue), this translates to a major problem for B2B marketers.
How can your team ensure its lead gen processes are GDPR-compliant without undermining demand generation performance?
View this deck to see how Julian Archer (Sr. Research Director, SiriusDecisions) and Scott Vaughan (CMO, Integrate) educate B2B marketers on: developing a comprehensive GDPR compliance strategy, putting your compliance strategy into action, and applying software to support your compliance measures.
To watch the on-demand version of the webinar, click here:
https://www.integrate.com/gdpr-compliance-b2b-marketing-webinar
Data-driven General Data Protection Regulation Compliance By BigID
European Union General Data Protection Regulation (GDPR) is a landmark in data privacy protection. In formalizing individual rights including explicit consent, accountability, and processing transparency, the GDPR has teeth: regulators can impose hefty penalties of up to 4% of global revenues for violations. GDPR requires an enterprise to formalize how they manage and track personal data. BigID provides a next-generation Big Data Approach to help companies meet the regulatory requirements of GDPR.
How to Prepare Your SAP System for the New European Union General Data Protection Regulation. Learn how to change your practices within your SAP environment so that they comply with the new
data General Data Protection Regulation (GDPR) privacy regulation
As a general reference, the main transaction codes to access master data tables include:
• Create, change and display customers, prospects, and contact persons (XD0*, VD0*, VAP*) and
reporting-related lists (S_ALR_87012179, S_ALR_87012180)
• Create, change, and display vendors (XK0*, MK0*) and reporting-related lists (S_ALR_87012086)
• Create, change, and display employee (PA10, PA20, PA30) and applicant (PB10, PB20, PB30) files
• Create and maintain bank master data (FI01, FI02, FI06) and business partners (BP, BUP1)
• Maintain general tables (SE11, SM30, SM31)
• Browse data (SE16) and display a table (SE16N)
Similar to GDPR: the IBM journey to compliance (20)
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Data serves as the platform for decision-making at Uber. To facilitate data driven decisions, many datasets at Uber are ingested in a Hadoop Data Lake and exposed to querying via Hive. Analytical queries joining various datasets are run to better understand business data at Uber.
Data ingestion, at its most basic form, is about organizing data to balance efficient reading and writing of newer data. Data organization for efficient reading involves factoring in query patterns to partition data to ensure read amplification is low. Data organization for efficient writing involves factoring the nature of input data - whether it is append only or updatable.
At Uber we ingest terabytes of many critical tables such as trips that are updatable. These tables are fundamental part of Uber's data-driven solutions, and act as the source-of-truth for all the analytical use-cases across the entire company. Datasets such as trips constantly receive updates to the data apart from inserts. To ingest such datasets we need a critical component that is responsible for bookkeeping information of the data layout, and annotates each incoming change with the location in HDFS where this data should be written. This component is called as Global Indexing. Without this component, all records get treated as inserts and get re-written to HDFS instead of being updated. This leads to duplication of data, breaking data correctness and user queries. This component is key to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. This component will need to have strong consistency and provide large throughputs for index writes and reads.
At Uber, we have chosen HBase to be the backing store for the Global Indexing component and is a critical component in allowing us to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. In this talk, we will discuss data@Uber and expound more on why we built the global index using Apache Hbase and how this helps to scale out our cluster usage. We’ll give details on why we chose HBase over other storage systems, how and why we came up with a creative solution to automatically load Hfiles directly to the backend circumventing the normal write path when bootstrapping our ingestion tables to avoid QPS constraints, as well as other learnings we had bringing this system up in production at the scale of data that Uber encounters daily.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.