High value analytics in FS are being enabled by Graph, machine learning and Spark technologies. To make these real at production scale HPC technologies are more appropriate than commodity clusters.
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
Leaders who embrace data have a profound impact on their organizations yet too few seize the opportunity. Biases in decision making, technology myths, data quality and analytical skills and are the most frequently cited obstacles by organizations of all sizes. Technology advances have neutralized the scale advantage and have democratized analytics for every organization – so now what? Are you to engage more data in your management decisions? Do you have an analytic strategy that has two speeds – one for innovation and one for scale? Are you investing in your top talent so they can ask new questions?
We’ll explore these topics and how to create an analytic culture in your organization. We’ll share how leaders have transformed their organizations by innovating their analytic processes, re-designing the way they work and embracing new technology innovation. We’ll dispel myths about technology and provide you a foundation for building your journey to analytic excellence.
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
Leaders who embrace data have a profound impact on their organizations yet too few seize the opportunity. Biases in decision making, technology myths, data quality and analytical skills and are the most frequently cited obstacles by organizations of all sizes. Technology advances have neutralized the scale advantage and have democratized analytics for every organization – so now what? Are you to engage more data in your management decisions? Do you have an analytic strategy that has two speeds – one for innovation and one for scale? Are you investing in your top talent so they can ask new questions?
We’ll explore these topics and how to create an analytic culture in your organization. We’ll share how leaders have transformed their organizations by innovating their analytic processes, re-designing the way they work and embracing new technology innovation. We’ll dispel myths about technology and provide you a foundation for building your journey to analytic excellence.
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
"Big Data Use Cases" was presented to Lansing BigData and Hadoop Users Group Kickoff meeting on 2/24/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of CDH 5.3, HDP 2.2 and AWS cloud
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
Overview of analytics and big data in practiceVivek Murugesan
Intended to give an overview of analytics and big data in practice. With set of industry use cases from different domains. Would be useful for someone who is trying to understand Analytics and Big Data.
Strategizing Big Data in Telco
Big data feels to be a very hot topic nowadays. Some industries depend on it completely, some have opportunities to roll out their strategies and execute, some just considering when it is a right time to hop in.
To my mind, Big Data is not about technology. Big data is about people generating data and data used for the benefit of people.
Big data is a pool of activities intended at processing the data a company owns (internal and external) so that to open new revenue opportunities, minimize costs and enhance UX.
I had some ideas and thoughts on what telecommunication companies may start from in formulating the Big Data Strategy and so packed some of the most important pieces of thoughts into a small presentation.
What is the difference between Small Data and Big Data?
What kind of data is used currently and which is to be relied on a new paradigm?
What kind of products are expected from telcos?
My personal ranking of operators in terms of their Big Data execution
What are the stages telcos should pass through to become a Big Data operator?
Prerequisites for Big Data transformation
Please take a look at the presentation to find answers to these questions and feel free to share your opinion.
Thanks!
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
Big Data Analysis Patterns: Tying real world use cases to strategies for analysis using big data technologies and tools.
Big data is ushering in a new era for analytics with large scale data and relatively simple algorithms driving results rather than relying on complex models that use sample data. When you are ready to extract benefits from your data, how do you decide what approach, what algorithm, what tool to use? The answer is simpler than you think.
This session tackles big data analysis with a practical description of strategies for several classes of application types, identified concretely with use cases. Topics include new approaches to search and recommendation using scalable technologies such as Hadoop, Mahout, Storm, Solr, & Titan.
Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
In this webinar Anthony J. Sarkis, Chief Strategy Officer at Parabole, and Steve Sarsfield, VP Product at Cambridge Semantics, explore how portfolio managers are using the recently developed Parabole/ AnzoGraph DB integration as their underlying infrastructure for conducting ML and cognitive analytics at scale to exploit data to identify potential risks and new opportunities.
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
For many IT experts, big data analytics tools and technologies are now a top priority. Let's find out the top big data analytics tools in this slide to initialize and advance the process of big data analysis.
Customer value analysis of big data productsVikas Sardana
Business value analysis through Customer Value Model for software technology choices with a case study from Mobile Advertising industry for Big Data use case.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
"Big Data Use Cases" was presented to Lansing BigData and Hadoop Users Group Kickoff meeting on 2/24/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of CDH 5.3, HDP 2.2 and AWS cloud
Big data is one of the most popular terms in the IT industry during the past decade. The word is vague and broad enough that essentially every one of us is living in a big-data world. Every time you do a google search, like a post in Facebook, write something in WeChat or view some item on Amazon, you both use and contribute to someone's big data system. Managing so much data across many computers introduce unique challenges. In this talk, we review the landscape of big data platforms and discuss some lessons we learned from building them.
Overview of analytics and big data in practiceVivek Murugesan
Intended to give an overview of analytics and big data in practice. With set of industry use cases from different domains. Would be useful for someone who is trying to understand Analytics and Big Data.
Strategizing Big Data in Telco
Big data feels to be a very hot topic nowadays. Some industries depend on it completely, some have opportunities to roll out their strategies and execute, some just considering when it is a right time to hop in.
To my mind, Big Data is not about technology. Big data is about people generating data and data used for the benefit of people.
Big data is a pool of activities intended at processing the data a company owns (internal and external) so that to open new revenue opportunities, minimize costs and enhance UX.
I had some ideas and thoughts on what telecommunication companies may start from in formulating the Big Data Strategy and so packed some of the most important pieces of thoughts into a small presentation.
What is the difference between Small Data and Big Data?
What kind of data is used currently and which is to be relied on a new paradigm?
What kind of products are expected from telcos?
My personal ranking of operators in terms of their Big Data execution
What are the stages telcos should pass through to become a Big Data operator?
Prerequisites for Big Data transformation
Please take a look at the presentation to find answers to these questions and feel free to share your opinion.
Thanks!
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
Big Data Analysis Patterns: Tying real world use cases to strategies for analysis using big data technologies and tools.
Big data is ushering in a new era for analytics with large scale data and relatively simple algorithms driving results rather than relying on complex models that use sample data. When you are ready to extract benefits from your data, how do you decide what approach, what algorithm, what tool to use? The answer is simpler than you think.
This session tackles big data analysis with a practical description of strategies for several classes of application types, identified concretely with use cases. Topics include new approaches to search and recommendation using scalable technologies such as Hadoop, Mahout, Storm, Solr, & Titan.
Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
In this webinar Anthony J. Sarkis, Chief Strategy Officer at Parabole, and Steve Sarsfield, VP Product at Cambridge Semantics, explore how portfolio managers are using the recently developed Parabole/ AnzoGraph DB integration as their underlying infrastructure for conducting ML and cognitive analytics at scale to exploit data to identify potential risks and new opportunities.
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
For many IT experts, big data analytics tools and technologies are now a top priority. Let's find out the top big data analytics tools in this slide to initialize and advance the process of big data analysis.
Customer value analysis of big data productsVikas Sardana
Business value analysis through Customer Value Model for software technology choices with a case study from Mobile Advertising industry for Big Data use case.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
Watch full webinar here: https://bit.ly/3hgOSwm
Data Lake technologies have been in constant evolution in recent years, with each iteration primising to fix what previous ones failed to accomplish. Several data lake engines are hitting the market with better ingestion, governance, and acceleration capabilities that aim to create the ultimate data repository. But isn't that the promise of a logical architecture with data virtualization too? So, what’s the difference between the two technologies? Are they friends or foes? This session will explore the details.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
Most organizations still rely on batch and offline processing of data streams to gain meaningful analysis and insight into their business. However, in our instant gratification world, real-time computation and analysis of streaming data is crucial in gaining insight into patterns and threats. A trend is emerging for real-time and instant analysis from live data streams, promoting the value of logs and a move toward functional programming.
This shift in technology is not about what and how to store the data, but what we can do with it to see emerging patterns and trends across multiple resources, applications, services and environments. Log data represents a wealth of information, yet is often sporadic, unstructured, scattered across the enterprise and difficult to track.
These slides provide insights into some of the most helpful Big Data tools used by the largest social media and data-centric organizations for competitive trends, instant analysis and feedback from large volume data streams. We show how how using Big Data tools Storm, ElasticSearch and an elastic UI can turn application logs into real-time analytical views.
You will also learn how Big Data:
Contains data that is elastic, minimally structured, flexible and scalable
Helps process live streams into meaningful data
Promotes a move toward functional programming
Effects the enterprise data architecture
Works with real-time CEP tools like Storm for functional programming
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
In Data Engineer's Lunch #60, Rahul Singh, CEO here at Anant, will discuss modern data processing/pipeline approaches.
Want to learn about modern data engineering patterns & practices for global data platforms? A high-level overview of different types, frameworks, and workflows in data processing and pipeline design.
Options for Data Prep - A Survey of the Current MarketDremio Corporation
Data comes in many shapes and sizes, and every company struggles to find ways to transform, validate, and enrich data for multiple purposes. The problem has been around as long as data, and the market has an overwhelming number of options. In this presentation we look at the problem and key options from vendors in the market today. Dremio is a new approach that eliminates the need for stand alone data prep tools.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
Freed from the constraints of storage, network and memory, many big data analytics systems now are routinely revealing themselves to be compute bound. To compensate, big data analytic systems often result in wide horizontal sprawl (300-node Spark or NoSQL clusters are not unusual!)— to bring in enough compute for the task at hand. High system complexity and crushing operational costs often result. As the world shifts from physical to virtual assets and methods of engagement, there is an increasing need for systems of intelligence to live alongside the more traditional systems of record and systems of analysis. New approaches to data processing are required to support the real-time processing of data required to drive these systems of intelligence.
Join 451 Research and Kinetica to learn:
•An overview of the business and technical trends driving widespread interest in real-time analytics
•Why systems of analysis need to be transformed and augmented with systems of intelligence bringing new approaches to data processing
•How a new class of solution—a GPU-accelerated, scale out, in-memory database–can bring you orders of magnitude more compute power, significantly smaller hardware footprint, and unrivaled analytic capabilities.
•Hear how other companies in a variety of industries, such as financial services, entertainment, pharmaceutical, and oil and gas, benefit from augmenting their legacy systems with a modern analytics database.
how to sell pi coins effectively (from 50 - 100k pi)DOT TECH
Anywhere in the world, including Africa, America, and Europe, you can sell Pi Network Coins online and receive cash through online payment options.
Pi has not yet been launched on any exchange because we are currently using the confined Mainnet. The planned launch date for Pi is June 28, 2026.
Reselling to investors who want to hold until the mainnet launch in 2026 is currently the sole way to sell.
Consequently, right now. All you need to do is select the right pi network provider.
Who is a pi merchant?
An individual who buys coins from miners on the pi network and resells them to investors hoping to hang onto them until the mainnet is launched is known as a pi merchant.
debuts.
I'll provide you the Telegram username
@Pi_vendor_247
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
how to sell pi coins at high rate quickly.DOT TECH
Where can I sell my pi coins at a high rate.
Pi is not launched yet on any exchange. But one can easily sell his or her pi coins to investors who want to hold pi till mainnet launch.
This means crypto whales want to hold pi. And you can get a good rate for selling pi to them. I will leave the telegram contact of my personal pi vendor below.
A vendor is someone who buys from a miner and resell it to a holder or crypto whale.
Here is the telegram contact of my vendor:
@Pi_vendor_247
how to sell pi coins in South Korea profitably.DOT TECH
Yes. You can sell your pi network coins in South Korea or any other country, by finding a verified pi merchant
What is a verified pi merchant?
Since pi network is not launched yet on any exchange, the only way you can sell pi coins is by selling to a verified pi merchant, and this is because pi network is not launched yet on any exchange and no pre-sale or ico offerings Is done on pi.
Since there is no pre-sale, the only way exchanges can get pi is by buying from miners. So a pi merchant facilitates these transactions by acting as a bridge for both transactions.
How can i find a pi vendor/merchant?
Well for those who haven't traded with a pi merchant or who don't already have one. I will leave the telegram id of my personal pi merchant who i trade pi with.
Tele gram: @Pi_vendor_247
#pi #sell #nigeria #pinetwork #picoins #sellpi #Nigerian #tradepi #pinetworkcoins #sellmypi
how to sell pi coins on Bitmart crypto exchangeDOT TECH
Yes. Pi network coins can be exchanged but not on bitmart exchange. Because pi network is still in the enclosed mainnet. The only way pioneers are able to trade pi coins is by reselling the pi coins to pi verified merchants.
A verified merchant is someone who buys pi network coins and resell it to exchanges looking forward to hold till mainnet launch.
I will leave the telegram contact of my personal pi merchant to trade with.
@Pi_vendor_247
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...beulahfernandes8
The financial landscape in India has witnessed a significant development with the recent collaboration between Poonawalla Fincorp and IndusInd Bank.
The launch of the co-branded credit card, the IndusInd Bank Poonawalla Fincorp eLITE RuPay Platinum Credit Card, marks a major milestone for both entities.
This strategic move aims to redefine and elevate the banking experience for customers.
how to sell pi coins in all Africa Countries.DOT TECH
Yes. You can sell your pi network for other cryptocurrencies like Bitcoin, usdt , Ethereum and other currencies And this is done easily with the help from a pi merchant.
What is a pi merchant ?
Since pi is not launched yet in any exchange. The only way you can sell right now is through merchants.
A verified Pi merchant is someone who buys pi network coins from miners and resell them to investors looking forward to hold massive quantities of pi coins before mainnet launch in 2026.
I will leave the telegram contact of my personal pi merchant to trade with.
@Pi_vendor_247
What price will pi network be listed on exchangesDOT TECH
The rate at which pi will be listed is practically unknown. But due to speculations surrounding it the predicted rate is tends to be from 30$ — 50$.
So if you are interested in selling your pi network coins at a high rate tho. Or you can't wait till the mainnet launch in 2026. You can easily trade your pi coins with a merchant.
A merchant is someone who buys pi coins from miners and resell them to Investors looking forward to hold massive quantities till mainnet launch.
I will leave the telegram contact of my personal pi vendor to trade with.
@Pi_vendor_247
Introduction to Indian Financial System ()Avanish Goel
The financial system of a country is an important tool for economic development of the country, as it helps in creation of wealth by linking savings with investments.
It facilitates the flow of funds form the households (savers) to business firms (investors) to aid in wealth creation and development of both the parties
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...Quotidiano Piemontese
Turin Startup Ecosystem 2024
Una ricerca de il Club degli Investitori, in collaborazione con ToTeM Torino Tech Map e con il supporto della ESCP Business School e di Growth Capital
Financial Assets: Debit vs Equity Securities.pptxWrito-Finance
financial assets represent claim for future benefit or cash. Financial assets are formed by establishing contracts between participants. These financial assets are used for collection of huge amounts of money for business purposes.
Two major Types: Debt Securities and Equity Securities.
Debt Securities are Also known as fixed-income securities or instruments. The type of assets is formed by establishing contracts between investor and issuer of the asset.
• The first type of Debit securities is BONDS. Bonds are issued by corporations and government (both local and national government).
• The second important type of Debit security is NOTES. Apart from similarities associated with notes and bonds, notes have shorter term maturity.
• The 3rd important type of Debit security is TRESURY BILLS. These securities have short-term ranging from three months, six months, and one year. Issuer of such securities are governments.
• Above discussed debit securities are mostly issued by governments and corporations. CERTIFICATE OF DEPOSITS CDs are issued by Banks and Financial Institutions. Risk factor associated with CDs gets reduced when issued by reputable institutions or Banks.
Following are the risk attached with debt securities: Credit risk, interest rate risk and currency risk
There are no fixed maturity dates in such securities, and asset’s value is determined by company’s performance. There are two major types of equity securities: common stock and preferred stock.
Common Stock: These are simple equity securities and bear no complexities which the preferred stock bears. Holders of such securities or instrument have the voting rights when it comes to select the company’s board of director or the business decisions to be made.
Preferred Stock: Preferred stocks are sometime referred to as hybrid securities, because it contains elements of both debit security and equity security. Preferred stock confers ownership rights to security holder that is why it is equity instrument
<a href="https://www.writofinance.com/equity-securities-features-types-risk/" >Equity securities </a> as a whole is used for capital funding for companies. Companies have multiple expenses to cover. Potential growth of company is required in competitive market. So, these securities are used for capital generation, and then uses it for company’s growth.
Concluding remarks
Both are employed in business. Businesses are often established through debit securities, then what is the need for equity securities. Companies have to cover multiple expenses and expansion of business. They can also use equity instruments for repayment of debits. So, there are multiple uses for securities. As an investor, you need tools for analysis. Investment decisions are made by carefully analyzing the market. For better analysis of the stock market, investors often employ financial analysis of companies.
Bitkom Cray presentation - on HPC affecting big data analytics in FS
1. HPC Meets Big Data in Financial
Services
Philip Filleul – Global Lead FS
2. Agenda
● Who is Cray
● Cray Vision and Products
● Breakthrough analytic technologies
● For each – Spark, Graph, Machine Learning
● What is it
● FS Use cases
● Technology needs to successfully deliver
● How Cray enables
● Key takeaways
3. Cray: The Myth vs. Reality
Myths
• They are huge
• They are proprietary
• They are complex
• They are expensive
Vs. Reality!
• They can be – but they start less than a rack
• No: Intel, Linux, open standards, Hadoop
• Simpler and more productive than a grid
• No – cost competitive, lower TCO, higher value
Three Focus Areas
• Computation
• Storage & Data Management
• Analytics
6. Where does Cray add value in FS?
BUSINESS
LINES
Asset
Management
Search for alpha
Strategy
confidence
Wealth
Management
Roboadvice
Scaling mass
affluent
Securities
Massive
regulation
Compliance
burden
Stress tests
Commoditization
Insurance
Telematics
Fraud
CROSS
INDUSTRY
Technology
Commoditization
and open source
Cloud
Big data
analytics
Cybersecurity
✔ ✔
✔
✔
✔
✔
✔
✔✔
✔
✔
7. Cray’s Vision:
The Fusion of Supercomputing and Big & Fast Data
Copyright 2016 Cray Inc.
Super Computing
Big Data
Analytics
Modeling The World
Cray Supercomputers solving “grand challenges” in science, engineering and analytics
Compute Store Analyze
Data-Intensive
Processing
High throughput event processing &
data capture from sensors, data feeds
and instruments
Math Models
Modeling and simulation
augmented with data to provide the
highest fidelity virtual reality results
Data Models
Integration of datasets and math
models for search, analysis,
predictive modeling and knowledge
discovery
High Performance Data Analytics (HPDA)
8. Cray Product Range and FS Applicability
Aries Interconnect
Single memory
Scalability
Package density
Grid compatibility
Upgradeability
Integrated Stack
Best in class power
and cooling
NVIDIA GPU density
Proven at scale
Integrated h/w and
s/w stack
Developer
productivity
CS400
Cluster
Supercomputer
XC40
Supercomputer
Risk/Pricing
CVA
Machine Learning
Superfast data sharing
Specialist within grid
Surprisingly Low TCO
Risk/Pricing
Options FFT
Algo backtesting
Deep Learning
9. Cray Product Range and FS Applicability
Lustre parallel file system
Single POSIX namespace
Modular scaling 7.5GB/s-1.7TB/s
Integrated and preconfigured
Reliability and availability at scale
Multi tier single namespace archive
Rule based policy migration
Flexible integration with most OEM
tape and disk
Preconfigured and integrated
Archive
Lustre Parallel File System High thruput for algo
analytics pipeline
Converged storage across
grid, analytics, Hadoop
Data Lake archival
Analytical data archival
Market data archival
Data no longer ‘deep
sixed’
10. Cray Product Range and FS Applicability
Most scalable graph
processor available
Whole graph analytics
possible
Open RDF/Sparql
Single memory space
and extreme threaded
processor
Cloudera 5.2/Yarn
Open to non CDH apps
Dense compute and
memory
SSD layer for HDFS
Lustre/Posix for scale
out storage
Urika-XA
Extreme Analytics
Platform
Urika-GD
Graph Discovery
Appliance
Surveillance
Cybersecurity
Ontology based
transaction compliance
Spark optimized
R/T streaming analytics
converged with regular
analytics
Machine learning
11. Breakthrough Analytic Technologies
Copyright 2015 Cray Inc.
● Growing CPU capability, commoditization, memory
size and IO bandwidth has made some new software
technologies explode
● Spark
● Graph
● Machine Learning
● For each:
● What are they?
● Why are they important in FS?
● What technology attributes do they need to deliver on the
promise?
● How does Cray enable?
12. Spark
Copyright 2015 Cray Inc.
● What is the technology
● General purpose, productive analytic technology
● Open source, target of much development work
● Memory first, shared data
● Base ecosystem for e.g. GraphX and MLlib
● FS Use Cases:
● Risk analytics
● Real-time alerting and dashboarding
● Web clickstream rapid ETL for CSRs
13. Spark Technology Needs
Copyright 2016 Cray Inc.
Compute
Node
Compute
Node
Memory
SSD
HDD
Block Shuffle over
interconnect
Intermediate results spill
over from memory, SSD
recommended for
latency/size balance
HDD for Job
input and output
HDFS vs. Parallel File System for high
bandwidth and scaling disk separate to
compute
Performance Recommendation
- Fast interconnect
- SSD per node
- Shared parallel filesystem
14. What is Graph
Copyright 2016 Cray Inc.
A Traditional RDBMS is GOOD at:
- Rapid update
- Simple queries about items
But BAD at:
- Relationships between data items
- Patterns of relationships
- Interactions between many data items
- E.g. suspicious pattern of actions
Graph databases:
Operate entirely in memory
15. Discovering New Risk/Compliance events
● Goal: Find detection patterns and improve
efficiency of the investigation process by
reducing false positives
● Data sets: Accounts, Customer Transactions,
3rd party data feeds, Detection and Case
Management systems
● Technical Challenges: Rigid detection system
schemas and rules; Constantly degrading
performance as new data comes in; Hard to
tune performance with new data; Long data on-
boarding timeframes; Manual disposition of
benign alerts
● Users: Investigators, Analysts
● Usage model: Tune detection system models
via data discovery; Enhance, improve and
augment the alert investigations process
● Augmenting: Existing detection systems
RestrictedTradingList
Trader
StockSymbol
LegalEmployee
DestIP
Port
Protocol SourceIP
TypeDateTime
BadgeLogs
EntryTime
ExitTime
Location
SystemWith
AdminRights
ITEmployee
PolicyViolations
Location
Restriction
RestrictionStartDate
Department
CounterParty
Transaction
Date
Transaction
Type
RestrictionDate
Communication
Event
Location
Restriction
RestrictionStartDate
RecordType
Time
16. Inexperienced
CSR Event
Resolutions
Discovering Customer Churn drivers
● Goal: Identify correlations between service events
(truck rolls, call escalations, customer service rep
experience level, set-top box reliability…) and
customer churn
● Data sets: Customer records, Historic billing
records, IVR, HR/training records, customer
surveys, Network Operations data, Work Orders…
● Technical Challenges: Volume, Variety and
Velocity of data; Disconnected and disparate data
sources from operational lines of business and 3rd
party contractors
● Users: Customer Operations Analysts
● Usage model: Analyze relationships between
service & related events and eventual customer
contract outcomes
● Augmenting: Existing data warehouse appliances
Customers
Call Center
Events
Work
Orders
Call
Escalations
Truck Rolls Set-Top
Box feeds
Supervisor
Intervention
3rd Party
Service Tech
AVR
Failure
CSR
Resolution
Cabin
et
Failure
Residential
Accounts
Web
Service
Commercial
Accounts
17. Mphasis Nextangles: A Disruptive Approach
Regulations
& Policies
Data & IT
systems
Now : Sample Audits
connect the two silos
NextAngles: Bridges the
two through Knowledge
Models
1. Regulations are deconstructed to
computer understandable rules
2. Rules are applied to Smart Data
3. This application is through
knowledge model
Old World Solution :
Inadequate
New World Solution :
Knowledge models
18. 1818
NEXTANGLES
Massively scalable, “Living” model of the bank
How it worksHOW NEXTANGLES WORKS
Convert to
“Smart Data”
Time
Investigation
Tools
Customer’s
Systems
Dashboards
Concept
Model
Rules
Inferences
• Potential violations
• Prohibited activities
• Operational risk measures
• Data problems
T1 T2 T3 T4 T5 T6 T7
Context model
• Line of business
• Legal entities
• Geographies
• Customer segments
• Organization structure
• Processes
Reference & Transaction Data
• Parties
• Accounts / GL / positions
• Transactions & events
“Facts”
Encoded
Regulations &
Policies
Encoded
Banking
Knowledge
19. ENABLER #1: SMART DATA
• Data stored as computer
intelligible “graphs”
What is it?
Class
predicate
• Formal standards from the W3C
and other bodies
• Over 12 years Semantic Web has
evolved to a full ecosystem of
products and practices
• Order of magnitude reduction in
handling real world data
complexity
How is it enabled? Value Proposition
Making the data computer intelligible
ObjectSubject
Data SMART Data
20. ENABLER #2: RULES & CURATED KNOWLEDGE
Reliable, consistent and predictable application of reasoning & complex rules
• Built on the smart data model:
helps computers reach the same
conclusions as human
knowledge workers
• Knowledge expressed as rules
that are intrinsically part of the
smart data ecosystem
• Reduces the need for humans to
intervene & define “how” to solve
problems
What is it? How is it enabled? Value Proposition
Traditional
Rules
Data
Traditional Rules: need to be wired in
Rules in a Smart Data ecosystem: Fills in gaps
21. ENABLER #3: WORKSPACES
• A complete rethink of user
interfaces around smart data &
knowledge models
• Semantic + knowledge base
driven “Noun-verb” paradigm
• “Workspaces” – context where
users work through an enquiry
• 6 widgets:
• Solves the “I need Excel”
problem
• Solves the swivel chair problem
• Solves the vocabulary problem
What is it? How is it enabled? Value Proposition
A rethink of enterprise applications for knowledge workers
Faceted Search View
List VisualizeHistory
Forms / WizardsWorkspace
22. ENABLER #4: LEARNING
• Learns from user behavior to
help pre-populate workspaces
• Learns how users use tools to
perform tasks
• Tries to proactively bring up
the tools when it sees a
similar situation
• Interim work products can be
turned into future automation
• User behavior in a user interface
is tracked in detail, and encoded
into smart data
• Learning algorithms eliminate
dead ends & build an optimum
path to the answers
• Effort for manual tasks reduces
over time
• Almost like “custom screens” for
1000’s of subtle variations
• The Next Angles learns from
users’ behavior
• Supervisors can short-circuit
learning engine to “pre-configure”
workspaces
What is it? How is it enabled? Value Proposition
Continuous improvement of efficiency and effectiveness through learning
23. Anti Money Laundering: Solutions to a Real Problem
● Backlog of investigations due to large number of alerts
● Constantly changing AML rules and regulations
● Consolidation of data from various systems within and outside the
bank
● Balancing the load with limited resources
Challenges
24. Urika-GD: Purpose-built for data discovery
1,944
Times
Faster !
“In the amount of time it takes to validate one hypothesis, we can now validate 1000
hypotheses – increasing our success rate significantly.” – Dr. Ilya Shmulevich
Access all data with uniform, low latency
regardless of partitioning, layout or
access pattern
Do not know the relationships
in the data
Do not know the desired
insight or the right question to
ask
Do not know the
paths/linkages to explore
diverse data sets
Investigate multiple, changing
hypotheses in parallel without
prefetching/caching
Explore diverse data fused without
upfront modeling and independent of
linkage/traversal path
Shared Memory
Model
Memory Accelerator
In Memory, Graph
Analytical Database
# PROCESSORS TIME
Traditional Approaches after months of optimization 48 10.8 Hours
Cray 32 30 sec
25. Machine Learning
All the data vs. a sample, messy data is OK
Correlation vs. causation
Algorithms fine tune themselves
Machine Learning is Different:
26. Machine Learning use cases in FS
Copyright 2016 Cray Inc.
● Anomaly detection for compliance
● Rogue traders, Fat Fingers
● E.g. normal accuracy with decision trees: 70-75%
● Deep neural nets >90%, which can halve fraud costs
● Fraud, money laundering
● Trading Strategies
● Risk and reward prediction
● Structured and unstructured data sources
● Personnel and Customer Management
● Recruiting/Turnover prevention
● CRM for trading platforms
27. Supervised Machine Learning
Copyright 2016 Cray Inc.
First label data: human
judgments on historic
data – e.g. fraud or not
fraud
Statistical analysis of
training data
Model finds correlations
between input data and
human applied labels
•1000s of features: events, state,
temporal, graph
•Millions of fraud patterns
•Copes with noisy data
28. Deep Learning as the emerging Supervised
Learning ML
● NVIDIA the thought (technology) leader in Deep
Learning
● GPU technology well-suited
● Adopters like Google, Facebook, Microsoft
Especially successful for
- Pattern recognition
- Feature extraction
in speech, pictures, time-series
29. Technology Needs of Machine Learning
Copyright 2016 Cray Inc.
● Highly parallel any to any
● Dense compute, large memory, fast interconnect
● Deep learning: Dense GPUs depending on toolset
Cray XC for large single image memory scaling
Cray CS-Storm for dense GPUs for Deep Learning
A greater engineering challenge than you might think
Cray makes the world’s densest most scalable and RELIABLE GPU
systems
31. In Summary: New Analytics Technology Needs
Copyright 2016 Cray Inc.
Characteristic Older Hadoop Traditional HPC Advanced
Analytics
Interconnect Slow Fast/Intelligent Fast/Intelligent
Single memory
capability
No Yes Yes
High Bandwidth
I/O
No Yes Yes
Node Local
Storage
Yes No Hybrid
Compute density Low High High
GPUs No Yes Yes
32. Summary
Copyright 2016 Cray Inc.
● Game changing analytics technologies are arriving
● They have high ROI use cases in FS
● Their technology demands do not align with traditional
Hadoop clusters
● Their technology needs are closer to HPC
● Cray has great heritage, experience and technology
● Cray is designing new age analytic products
Editor's Notes
Title Slide intended for Financial presentations
This chart shows key concerns in different parts of financial services and the many areas (in those key concerns) where Cray adds real value
Cray is thought of as a classic large supercomputer maker but is less known in Financial Services. To enable world class scaling HPC Cray has invested in its own highly intelligent and performant interconnect and when combined with a systems engineering approach and commonly available commodity technologies it has enabled us to produce open standards compliant, highly performant, reliable and competitively priced scalable systems that are highly relevant in FS.
The financial services industry is experiencing a wave of innovation enabled by big and fast data at a time of unprecedented regulatory pressure and margin compression. Firms are being forced to not only cut costs and re-platform onto more cost effective data platforms such as those based on Hadoop, but also to embrace regulatory oversight and move to more real time monitoring of their positions, their risks as well as areas like employee surveillance.
Lets look at risk management as an example.
Historically the nightly value at risk (VAR) workload is one of the cornerstones of banking IT.
It is highly parallel and runs on 10s to 100s of thousands of cores in a large commodity grid in a linear algebra Monte Carlo math model. The pressure now is to move that to more real time for regulatory and sound business management reasons.
This is a good example of a traditional HPC application looking more like a high performance data analytics application (HPDA).
Some firms are looking at Spark to re-platform this application. The technology needs of this new style of application are very different to the traditional commodity cluster and need high bandwidth, throughput, concurrency and any-to-any communications with an underlying HPC capability. All these more demanding needs play well to Cray’s strengths and to our newer system design points.
Cray is a technology leader in each of the areas of scalable compute, storage and analytics, but it is the combination of all these in this new style of application where we have unparalleled strength.
Spark performance is very complex with many factors that are highly use case dependent but some broad generalities can be seen.
At the start of a job data from disk needs to be brought into the nodes memory
With HDFS this is normally stored 3x in different places on the cluster.
An alternative is to store data on a parallel filesystem such as open source Lustre. This allows:
Data and compute to be scaled separately, data has high throughput but slightly higher latency than HDD on node
This is very suitable for the central storage of input and output data (rather than intermediate results where latency is more important than throughput)
In the map phase of processing data is normally self contained on the node. Intermediate results can either rest in memory on disk or both. Depending on the use case when larger internediate results are needed SSD is highly suitable being cheaper than memory, and more extensible. HDD can be used but latencies are much higher.
In the reduce phase of processing results are aggregated across nodes. Here we see dependency on interconnect speeds.
Key points:
Objective: identify financial risks posed by fraudulent / illegal activity before they incur massive punitive fines
Challenge: Enormous volumes of surveillance and trading data; key indicators of collusion or wrong-doing are deeply hidden, and many patterns need to be explored to identify malfeasance with significant certainty
Solution: Urika’s ability to fuse data from many sources into an enormous graph and search for hidden patterns allows 1000s of hypotheses to be explored in the time previously required to explore just one, increasing success rate.
Words:
The patience of the SEC and other regulators with financial services firms is at an all time low, with hefty penalties being applied to corporate wrong doing. Even worse is the damage done to the reputation of the firm in the media scrum if such wrong doing comes to light. It is more crucial than ever to have controls in place that detect and stop such wrong doing by employees before they can cause organizational harm, and the role of the corporate risk officer has grown accordingly.
Risk management is complex, however. Consider insider trading within the context of an investment bank. You have to identify traders in a particular equity who interact with others having inside knowledge. Even identifying “interactions” is complex, given that people might communicate by phone, email, in-person, via an intermediary … tracking down such behaviour involves looking at data from a myriad of different sources, and correlating with complex trading patterns across time. And that’s just one form of risk!
In addition to insider trading, risk and compliance officers are also tasked with detecting and implementing anti-money laundering procedures, identifying systemic risk factors through co-party risk analysis and many others.
Urika addresses these challenges. It provides a platform that allows risk investigators to fuse data from a wide variety of sources in real time and search for patterns of interaction that could indicate insider trading or other malfeasance. Urika fuses the data from multiple sources into a graph, and provides the means to pose complex, ad-hoc analytic queries across the entire dataset and obtain results in real time.
Key points:
Operational focus: identify customers at the highest risk of churn and design a sticky package of services to aide in retention
Challenge: continuous refinement of techniques used to identify at-risk customers, using data from variety of very large datasets
Solution: Urika – fuse datasets into a big graph, analyze patterns of interactions and churn and formulate patterns which can identify at-risk customers
Words:
Churn is a problem that plagues every telecoms company, particularly mobile providers. When it can cost hundreds of dollars to acquire a new customer, minimizing churn is a major strategic initiative with direct impact on profitability.
However, predicting churn is quite complex. One major international carrier realized that customer satisfaction could be measured using a variety of information sources: service events (outages, call escalations, customer rep experience), frequency and response to those service records, calling patterns, social media, influencers and a variety of other information. It’s not easy: the data comes in many different formats, and is very voluminous…
The customer used Urika to fuse together these many disparate sources of information, and searched for historical correlations between these various indicators and an eventual churn outcome. That analysis produced a set of patterns, which could then be applied to existing customers to predict who was likely to churn – long before customer dissatisfaction reached the point where they were considering churning. Analysis of their calling patterns and influencers enabled the creation of highly targeted offers to those customers, with very high acceptance rates. This solution offered the best of all worlds: it minimized costs by targeting the most at-risk customers exclusively and created very high value, sticky offers for those customers specifically.
As a side note, it should be mentioned that the telco quickly realized that this data source could be used to discovery many other things. Example: they were able to determine that there were an enormous number of unnecessary truck rolls because a common way of dealing with customer complaints was to move the customer from one line to another at the central office – which placated that customer, but resulted in dissatisfaction on the part of the customer that was displaced from the “good” line, resulting in another complaint and truck roll…
Urika addresses the problem using a graph database that fuses all the data together, enabling a 360 degree view of all the relationships that any particular entity is involved in. This enables sophisticated querying and analysis, including temporal analysis. The use of whole graph analytics facilitates identifying key influencers and the construction of a sticky, micro-targeted offer.
Regulatory compliance & Policies’ and ‘Data processing in IT systems’ typically sit in 2 separate silos. With the accelerating change in regulations, it is impractical to try and bridge the gap through sample audits. NextAngles bridges the gap between the two knowledge models