Banks, Payment Providers and capital markets firms are under intense regulatory mandate to process huge amounts of transaction-related data from both traditional and non-traditional sources. Compliance teams need to constantly analyze data-in-motion (wires, fund transfers, banking transactions) and data-at-rest (years worth of historical data) for actionable intelligence required for Suspicious Activity Reports—to discover illegal activity and provide detailed reporting to authorities. Annual estimates of global money laundering flows ranging anywhere from $ 1 trillion to 2 trillion – almost 5% of global GDP. Almost all of this is laundered via Retail & Merchant Banks, Payment Networks, Securities & Futures firms, Casino Services & Clubs etc – which explains why annual AML related fines on Banking organizations run into the billions and are increasing every year. However, the number of SARs (Suspicious Activity Reports) filed by banking institutions are much higher as a category as compared to the numbers filed by these other businesses. In this presentation we will discuss the business imperatives, value drivers and the woeful inadequacy of current technology architectures and approaches in tackling AML. We will then pivot to a deepdive around Big Data and Predictive Analytics in how they can ease and solve these vexing challenges that Banking executives are grappling with globally.
The talk will have 3 parts. The overview of the practical applications of the AI and ML in the FinTech industry with a short explanation of the PSD2 directive and the disruption is caused. Application of the AI/ML from the perspective of the end-user, personal financial health, financial coach, etc. The overview of the architecture, technologies, and frameworks used with practical examples from the Zuper company.
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
Credit Card Fraudulent Transaction Detection Research Paper using Machine Learning technologies like Logistic Regression, Random Forrest, Feature Engineering and various techniques to deal with highly skewed dataset
Towards the Next Generation Financial Crimes Platform - How Data, Analytics, ...Molly Alexander
Towards the Next Generation Financial Crimes Platform - How Data, Analytics, & ML Are Transforming the Fight Against Fraud, AML & Cybersecurity -Nadeem Asghar
Enterprise Fraud Management: How Banks Need to AdaptCapgemini
Fraud prevention is becoming one of the biggest areas of concern for the financial services industry. But first generation Fraud Management systems are falling short. By moving towards more enterprise approach to fraud management, financial institutions can combat the increasingly treacherous fraud and cyber crime landscape while reaping numerous benefits for the organization.
The talk will have 3 parts. The overview of the practical applications of the AI and ML in the FinTech industry with a short explanation of the PSD2 directive and the disruption is caused. Application of the AI/ML from the perspective of the end-user, personal financial health, financial coach, etc. The overview of the architecture, technologies, and frameworks used with practical examples from the Zuper company.
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
Credit Card Fraudulent Transaction Detection Research Paper using Machine Learning technologies like Logistic Regression, Random Forrest, Feature Engineering and various techniques to deal with highly skewed dataset
Towards the Next Generation Financial Crimes Platform - How Data, Analytics, ...Molly Alexander
Towards the Next Generation Financial Crimes Platform - How Data, Analytics, & ML Are Transforming the Fight Against Fraud, AML & Cybersecurity -Nadeem Asghar
Enterprise Fraud Management: How Banks Need to AdaptCapgemini
Fraud prevention is becoming one of the biggest areas of concern for the financial services industry. But first generation Fraud Management systems are falling short. By moving towards more enterprise approach to fraud management, financial institutions can combat the increasingly treacherous fraud and cyber crime landscape while reaping numerous benefits for the organization.
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...accenture
In recent years, technological developments have undergone in-depth analysis among banks, but we are still far from attaining mature levels both at the methodological and at the credit granting, monitoring and control process levels. Banks should equip themselves with new and more structured Model Risk frameworks to manage new Machine Learning model validation paradigms. Learn more from Accenture Finance & Risk: https://accntu.re/2qGUUMx
Tutorial on 'Explainability for NLP' given at the first ALPS (Advanced Language Processing) winter school: http://lig-alps.imag.fr/index.php/schedule/
The talk introduces the concepts of 'model understanding' as well as 'decision understanding' and provides examples of approaches from the areas of fact checking and text classification.
Exercises to go with the tutorial are available here: https://github.com/copenlu/ALPS_2021
Fraud detection is a popular application of Machine Learning. But is not that obvious and not that common as it seems. I'll tell how QuantUp implemented it for WARTA insurance company (a subsidiary of Talanx International AG).
The models developed gave between 10% and 30% of reduction of losses. The project was not a simple one because of the complex process of handling claims and using really rich dataset. The tools applied were R (modeling) and DataWalk (data peparation). You will learn what is important in development of such solutions in general, what was difficult in this particular project, and how to overcome possible difficulties in similar projects.
Fraud continues to proliferate across financial institutions, through multiple lines of business and banking channels. Increasingly sophisticated criminal tactics and the proliferation of organized crime rings make detecting fraud difficult and preventing it nearly impossible. Adding to the complexity is increased globalization and growth through mergers and acquisition, which make it harder to effectively monitor multiple portfolios and business lines. The presentation discussus best practices and ideas around the prevention, investigation, and detection of possible fraudulent activities across multiple industries.
Lifting the Barriers to Retail Innovation in ASEAN | A.T. KearneyKearney
Rising incomes and growing demand for consumer goods and services in ASEAN create rich opportunities for retailers in the region, which is especially significant as member nations join forces to become an economic powerhouse. Yet ASEAN retailers have been slow in terms of Innovation and as this market opens up, stepping up innovation is required to capitalize fully on the opportunities.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Crime sensing with big data - Singapore perspectiveBenjamin Ang
This presentation examines the Potentials and Limitations of Using Big Data for Crime Estimation. Singapore laws discussed include Personal Data Protection Act (PDPA), Penal Code, Criminal Procedure Code (CPC). Topics covered include Crime Analysis, Crime Prediction, Algorithm Bias, and other risks. The video of this presentation can be found at https://youtu.be/kctB3lRLh2U
Smarter Fraud Detection With Graph Data ScienceNeo4j
Join us for this 20-minute webinar to hear from Nick Johnson, Product Marketing Manager for Graph Data Science, to learn the basics of Neo4j Graph Data Science and how it can help you to identify fraudulent activities faster.
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
See how financial services, banking and retail are using graph-enhanced machine learning to thwart fraud. Fraudsters are becoming increasingly sophisticated, organized and adaptive; traditional, rule-based solutions are not broad or nimble enough to deal with this reality. This session will cover several demonstrations and real-world technical examples including preventing credit card fraud, identifying money laundering and reducing false positives.
The Covid-19 pandemic necessitated the payments industry undergo a facelift, sparked by novel approaches from new-age players, fostered by industry consolidation, and customers’ demand for end-to-end experience. Crossing the threshold, the industry is entering a new era – Payments 4.X, where payments are embedded and invisible, and an enabling function to provide frictionless customer experience. As customers make a permanent shift to next-gen payment methods, Digital IDs are critical for a seamless payment experience. The B2B payments segment is witnessing rapid digitization. BigTechs, PayTechs, and industry newcomers are ready to jump in with newfangled solutions to help underserved small to medium-sized businesses (SMBs).
As incumbents struggle with profits, new-age firms are forging ahead to take the lead in the Payments 4.X era by riding the success of non-card products and services. The new era demands collaboration, platformification, and firms can unleash full market potential only by embracing API-based business models and open ecosystems. Data prowess and enhanced payment processing capabilities are inevitable to thrive ahead. The clock is ticking for banks and traditional payments firms because the competitive advantage is not guaranteed forever. As industry players seek economies of scale, consolidations loom, and non-banks explore new territories to threaten incumbents’ market share. While all these 2022 trends are at play, central bank digital currency (CBDC) is emerging globally and might open a new chapter in the current payments landscape.
This second machine age has seen the rise of artificial intelligence (AI), or “intelligence” that is not the result of
human cogitation. It is now ubiquitous in many commercial products, from search engines to virtual assistants. aI is the result of exponential growth in computing power, memory capacity, cloud computing, distributed and parallel processing, open-source solutions, and global connectivity of both people
and machines. The massive amounts and the speed at which structured and unstructured (e.g., text, audio, video, sensor) data is being generated has made a necessity of speedily processing and generating meaningful, actionable insights from it.
The journey from open banking to open finance+. The evolution of open banking based on API as of now and where it could go from here. Risks and opportunities for market participants.
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...accenture
In recent years, technological developments have undergone in-depth analysis among banks, but we are still far from attaining mature levels both at the methodological and at the credit granting, monitoring and control process levels. Banks should equip themselves with new and more structured Model Risk frameworks to manage new Machine Learning model validation paradigms. Learn more from Accenture Finance & Risk: https://accntu.re/2qGUUMx
Tutorial on 'Explainability for NLP' given at the first ALPS (Advanced Language Processing) winter school: http://lig-alps.imag.fr/index.php/schedule/
The talk introduces the concepts of 'model understanding' as well as 'decision understanding' and provides examples of approaches from the areas of fact checking and text classification.
Exercises to go with the tutorial are available here: https://github.com/copenlu/ALPS_2021
Fraud detection is a popular application of Machine Learning. But is not that obvious and not that common as it seems. I'll tell how QuantUp implemented it for WARTA insurance company (a subsidiary of Talanx International AG).
The models developed gave between 10% and 30% of reduction of losses. The project was not a simple one because of the complex process of handling claims and using really rich dataset. The tools applied were R (modeling) and DataWalk (data peparation). You will learn what is important in development of such solutions in general, what was difficult in this particular project, and how to overcome possible difficulties in similar projects.
Fraud continues to proliferate across financial institutions, through multiple lines of business and banking channels. Increasingly sophisticated criminal tactics and the proliferation of organized crime rings make detecting fraud difficult and preventing it nearly impossible. Adding to the complexity is increased globalization and growth through mergers and acquisition, which make it harder to effectively monitor multiple portfolios and business lines. The presentation discussus best practices and ideas around the prevention, investigation, and detection of possible fraudulent activities across multiple industries.
Lifting the Barriers to Retail Innovation in ASEAN | A.T. KearneyKearney
Rising incomes and growing demand for consumer goods and services in ASEAN create rich opportunities for retailers in the region, which is especially significant as member nations join forces to become an economic powerhouse. Yet ASEAN retailers have been slow in terms of Innovation and as this market opens up, stepping up innovation is required to capitalize fully on the opportunities.
AI powered Decision Making in Banks - How Banks today are using Advanced analytics in credit Decisioning, enhancing customer life time value, lower operating costs and stronger customer acquisition
Crime sensing with big data - Singapore perspectiveBenjamin Ang
This presentation examines the Potentials and Limitations of Using Big Data for Crime Estimation. Singapore laws discussed include Personal Data Protection Act (PDPA), Penal Code, Criminal Procedure Code (CPC). Topics covered include Crime Analysis, Crime Prediction, Algorithm Bias, and other risks. The video of this presentation can be found at https://youtu.be/kctB3lRLh2U
Smarter Fraud Detection With Graph Data ScienceNeo4j
Join us for this 20-minute webinar to hear from Nick Johnson, Product Marketing Manager for Graph Data Science, to learn the basics of Neo4j Graph Data Science and how it can help you to identify fraudulent activities faster.
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
See how financial services, banking and retail are using graph-enhanced machine learning to thwart fraud. Fraudsters are becoming increasingly sophisticated, organized and adaptive; traditional, rule-based solutions are not broad or nimble enough to deal with this reality. This session will cover several demonstrations and real-world technical examples including preventing credit card fraud, identifying money laundering and reducing false positives.
The Covid-19 pandemic necessitated the payments industry undergo a facelift, sparked by novel approaches from new-age players, fostered by industry consolidation, and customers’ demand for end-to-end experience. Crossing the threshold, the industry is entering a new era – Payments 4.X, where payments are embedded and invisible, and an enabling function to provide frictionless customer experience. As customers make a permanent shift to next-gen payment methods, Digital IDs are critical for a seamless payment experience. The B2B payments segment is witnessing rapid digitization. BigTechs, PayTechs, and industry newcomers are ready to jump in with newfangled solutions to help underserved small to medium-sized businesses (SMBs).
As incumbents struggle with profits, new-age firms are forging ahead to take the lead in the Payments 4.X era by riding the success of non-card products and services. The new era demands collaboration, platformification, and firms can unleash full market potential only by embracing API-based business models and open ecosystems. Data prowess and enhanced payment processing capabilities are inevitable to thrive ahead. The clock is ticking for banks and traditional payments firms because the competitive advantage is not guaranteed forever. As industry players seek economies of scale, consolidations loom, and non-banks explore new territories to threaten incumbents’ market share. While all these 2022 trends are at play, central bank digital currency (CBDC) is emerging globally and might open a new chapter in the current payments landscape.
This second machine age has seen the rise of artificial intelligence (AI), or “intelligence” that is not the result of
human cogitation. It is now ubiquitous in many commercial products, from search engines to virtual assistants. aI is the result of exponential growth in computing power, memory capacity, cloud computing, distributed and parallel processing, open-source solutions, and global connectivity of both people
and machines. The massive amounts and the speed at which structured and unstructured (e.g., text, audio, video, sensor) data is being generated has made a necessity of speedily processing and generating meaningful, actionable insights from it.
The journey from open banking to open finance+. The evolution of open banking based on API as of now and where it could go from here. Risks and opportunities for market participants.
With flickery markets, edgy economy, organizational change and the evolving regulatory landscape, the finance divisions are caught up in a fast increase in the amount of public support and changes. All this while, the need for cost cutting and delivering transparent reports stays stable. Rolta’s Financial Analytics solution CFO Impact helps you bring cost effective and sustainable transformations to financial processes and systems with the help of big data analytic technologies.
An overview of the FRAUD Solution specific for the GCC market. Includes specific policy rules, negative data and scorecards built upon 350000 historic accounts.
This presentation explores what future of commerce may look like given the current trends in mobile devices, digital payments, social commerce and security including tokenization and new forms of identity verification
Webinar Deck: Efficient Methods for Managing Global Cash in Today's Regulator...Kyriba Corporation
Check out our powerpoint for Efficient Methods for Managing Global Cash in Today's Regulatory Regime where the expert speakers explored proven liquidity and intercompany cash management strategies, as well as tax/treasury collaborative initiatives that can help optimize global cash in an ever-changing complex environment.
Early Stage Fintech Investment Thesis (Sept 2016)Earnest Sweat
Here is an example of a personal investment thesis that I created to share with venture capital firms. In this example, I provide my personal perspective on the fintech sector. For details on how I build this thesis check out my blog (https://goo.gl/CU4Qid).
Note: Some of the confidential information has been redacted for privacy.
Corporate Treasurers Focus on Cyber SecurityJoan Weber
Treasury departments at large U.S. companies rank IT security as their top priority for 2015 - ahead of such critical issues as cost management and regulatory/compliance challenges.
These finding come from the results Greenwich Associates 2014 U.S. Large Corporate Finance Study, for which the firm interviewed CFOs or treasury department representatives at more than 500 large U.S. companies.
The study results suggest that U.S. companies are taking action to address security concerns and other IT issues with 63% of the participants saying their treasury departments will increase technology spending in the year ahead.
In this session we will discuss the business case for a proactive, real-time fraud prevention strategy which enables you to maximize revenue opportunities whilst minimizing fraud. During the session we will create a fraud management check list which combines People, Processes and Technology, underpinned by data, analysis and tailored rules.
Sample Report: Fraud and Security in Global Online Payments 2016yStats.com
Free Report Samples for our publication " Fraud and Security in Global Online Payments 2016"
Find the full report available for purchase at: https://ystats.com/shop/fraud-and-security-in-global-online-payments-2022/
Deloitte Dbriefs Program Guide | April - June 2014Franco Ferrario
Object : Anticipating tomorrow's complex issues and new strategies is a challenge. Stay tuned in with DBRIEFS Llive webcasts that give you valuable insights on important developments affecting your business
Uploaded by Franco Ferrario Technologies Executives ; Deloitte Evangelist
ActiveInsight offers real-time, value-based detection and reaction to complex event patterns. This presentation presents an overview of the business needs, ActiveInsight's features and several relevant use-cases.see http://www.activeinsight.net for more information.
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Data serves as the platform for decision-making at Uber. To facilitate data driven decisions, many datasets at Uber are ingested in a Hadoop Data Lake and exposed to querying via Hive. Analytical queries joining various datasets are run to better understand business data at Uber.
Data ingestion, at its most basic form, is about organizing data to balance efficient reading and writing of newer data. Data organization for efficient reading involves factoring in query patterns to partition data to ensure read amplification is low. Data organization for efficient writing involves factoring the nature of input data - whether it is append only or updatable.
At Uber we ingest terabytes of many critical tables such as trips that are updatable. These tables are fundamental part of Uber's data-driven solutions, and act as the source-of-truth for all the analytical use-cases across the entire company. Datasets such as trips constantly receive updates to the data apart from inserts. To ingest such datasets we need a critical component that is responsible for bookkeeping information of the data layout, and annotates each incoming change with the location in HDFS where this data should be written. This component is called as Global Indexing. Without this component, all records get treated as inserts and get re-written to HDFS instead of being updated. This leads to duplication of data, breaking data correctness and user queries. This component is key to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. This component will need to have strong consistency and provide large throughputs for index writes and reads.
At Uber, we have chosen HBase to be the backing store for the Global Indexing component and is a critical component in allowing us to scaling our jobs where we are now handling greater than 500 billion writes a day in our current ingestion systems. In this talk, we will discuss data@Uber and expound more on why we built the global index using Apache Hbase and how this helps to scale out our cluster usage. We’ll give details on why we chose HBase over other storage systems, how and why we came up with a creative solution to automatically load Hfiles directly to the backend circumventing the normal write path when bootstrapping our ingestion tables to avoid QPS constraints, as well as other learnings we had bringing this system up in production at the scale of data that Uber encounters daily.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms.
To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.