Dev Lakhani, Data Scientist at Batch Insights talks on "Real Time Big Data Applications for Investment Banks and Financial Institutions" at the first Big Data Frankfurt event that took place at Die Zentrale, organised by Dataconomy Media
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
In the age of Big Data and large volume analytics there is a lot to cover and a lot to learn. While at Microsoft developing Windows HDInsight and now developing a one of kind Big Data product at my own company Big Data Perspective, San Francisco I have lived last several years covering Big Data at various level. This talk is customized for database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of Big Data Analytics on Hadoop. This webinar will be useful for those, who wants to know what is Hadoop, and how they can take advantage just by spending few dollars to run the cluster. The webinar is great for those who are looking to deploy their first data cluster and run MapReduce jobs to discover insights.
Hadoop has showed itself as a great tool in resolving problems with different data aspects as Data Velocity, Variety and Volume, that are causing troubles to relational database storage. In this presentation you'll learn what problems with data are occurring nowdays and how Hadoop can solve them . You'll learn about Hadop basic components and principles that make Hadoop such great tool.
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
In the age of Big Data and large volume analytics there is a lot to cover and a lot to learn. While at Microsoft developing Windows HDInsight and now developing a one of kind Big Data product at my own company Big Data Perspective, San Francisco I have lived last several years covering Big Data at various level. This talk is customized for database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of Big Data Analytics on Hadoop. This webinar will be useful for those, who wants to know what is Hadoop, and how they can take advantage just by spending few dollars to run the cluster. The webinar is great for those who are looking to deploy their first data cluster and run MapReduce jobs to discover insights.
Hadoop has showed itself as a great tool in resolving problems with different data aspects as Data Velocity, Variety and Volume, that are causing troubles to relational database storage. In this presentation you'll learn what problems with data are occurring nowdays and how Hadoop can solve them . You'll learn about Hadop basic components and principles that make Hadoop such great tool.
Facing trouble in distinguishing Big Data, Hadoop & NoSQL as well as finding connection among them? This slide of Savvycom team can definitely help you.
Enjoy reading!
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
Lacking the technology to directly leverage Hadoop, some companies are foregoing its full benefits opting to treat Hadoop as just another data source for their legacy BI tools. But storage is only one benefit of Hadoop and ignores its linear scalability and data flexibility across all data types. Using Hadoop natively for both storage and computation in an analytic capacity has already led to dramatic increases in business benefits. Hadoop analytics has already identified over $2B in potential fraud at one of the world’s largest credit card companies. Sears has already reduced reporting times over traditional BI from 12 weeks to 3 days. A major internet security company increased customer conversion by 60% and revenue by $20 million. Meaningful returns are spread across Fortune 100 enterprises and fast growing startups with the common thread being self-service big data analytics leveraging Hadoop’s native capabilities. In this talk, we’ll highlight the core value proposition of building analytics natively on Hadoop, share real-world use cases that resulted in dramatic ROI, and reveal the next major step in visual big data analytics.
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
Facing trouble in distinguishing Big Data, Hadoop & NoSQL as well as finding connection among them? This slide of Savvycom team can definitely help you.
Enjoy reading!
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
Lacking the technology to directly leverage Hadoop, some companies are foregoing its full benefits opting to treat Hadoop as just another data source for their legacy BI tools. But storage is only one benefit of Hadoop and ignores its linear scalability and data flexibility across all data types. Using Hadoop natively for both storage and computation in an analytic capacity has already led to dramatic increases in business benefits. Hadoop analytics has already identified over $2B in potential fraud at one of the world’s largest credit card companies. Sears has already reduced reporting times over traditional BI from 12 weeks to 3 days. A major internet security company increased customer conversion by 60% and revenue by $20 million. Meaningful returns are spread across Fortune 100 enterprises and fast growing startups with the common thread being self-service big data analytics leveraging Hadoop’s native capabilities. In this talk, we’ll highlight the core value proposition of building analytics natively on Hadoop, share real-world use cases that resulted in dramatic ROI, and reveal the next major step in visual big data analytics.
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
Banks Betting on Big Data Analytics and Real-Time Execution to Better Engage ...SAP Analytics
Winning new business and satisfying customers are top agenda items in bank boardrooms worldwide. Executives are bullish on new technologies to meet these objectives.
Business Intelligence In Financial IndustryKartik Mehta
As more and more operations of banks operations and use of banking transactions by the customer is being performed on web, the amount of data is growing exponentially. To get insights into the future Business Intelligence in Financial Industry could be the way forward.
Why Blockchain Matters to Big Data - Big Data London Meetup - Nov 3, 2016BigchainDB
Why does blockchain matter to Big Data?
Bruce Pon, CEO and Co-Founder of BigchainDB talks about how blockchain and big data work together.
Follow BigchainDB on LinkedIn, download the whitepaper or sign up with at the IPDB Foundation to get access to a first test network build with BigchainDB to build your own blockchain application.
Commercial banking relates to deposit-taking and lending
They provide services to corporate and individual customers
Some commercial banks have investment banking arms e.g. Bank of America Merrill Lynch
Commercial banks make their profits by taking small, short-term, relatively liquid deposits from retail savers and transforming these into larger, longer maturity loans e.g. in the form of business loans and mortgages
An investment bank provides a wide range of specialized services for companies and large investors
These include
Underwriting and advising on securities issues and other forms of capital raising
Advice on mergers and acquisitions and also corporate restructuring
Trading on capital markets
Research and private equity investments
An investment bank trades and invests on its own account
Investment banks deal mainly with corporate customers
Goldman Sachs and Morgan Stanley are the last remaining major Wall Street investment banking businesses
Commercial banks can provide investment banking services
Big Data Alchemy: How can Banks Maximize the Value of their Customer Data?Capgemini
This document is a point of view on how banks can maximize the value of their customer data using big data analytics. While the volume of data has been increasing in recent years, many banks have not been able to profit from this growth. Several challenges hold them back. The PoV explores these challenges and suggests actions for banks in order to scale-up to the next level of customer data analytics.
Data is like the new currency, and myriad technologies revolves around the idea of putting Big Data into work, while enhancing the levels of ROI. Gear up to witness a humongous growth of data in the 2017, both in respect to variety and volume.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Big Data - The 5 Vs Everyone Must KnowBernard Marr
This slide deck, by Big Data guru Bernard Marr, outlines the 5 Vs of big data. It describes in simple language what big data is, in terms of Volume, Velocity, Variety, Veracity and Value.
With the proliferation of technology, banking customers are living in a connected world with their experience from other industries influencing their expectations from their financial services provider. This has led to an evolving customer-bank relationship necessitating banks to be more customer-centric by embedding themselves in customers’ lives to meet rising customer experience expectations. However, banks have been facing challenges in meeting customer expectations, as they are troubled with legacy challenges both in terms of technology and culture. This document aims to understand and analyze the trends in the banking industry that are expected to drive the dynamics of the banking ecosystem in the near future.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Gartner TOP 10 Strategic Technology Trends 2017Den Reymer
Gartner TOP 10 Strategic Technology Trends_2017
http://denreymer.com
Artificial Intelligence and Advanced Machine Learning
Intelligent Apps
Intelligent Things
Virtual Reality and Augmented Reality
Digital Twins
Blockchains and Distributed Ledgers
Conversational Systems
Digital Technology Platforms
Mesh App and Service Architecture
Adaptive Security Architecture
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Assessing New Databases– Translytical Use CasesDATAVERSITY
Organizations run their day-in-and-day-out businesses with transactional applications and databases. On the other hand, organizations glean insights and make critical decisions using analytical databases and business intelligence tools.
The transactional workloads are relegated to database engines designed and tuned for transactional high throughput. Meanwhile, the big data generated by all the transactions require analytics platforms to load, store, and analyze volumes of data at high speed, providing timely insights to businesses.
Thus, in conventional information architectures, this requires two different database architectures and platforms: online transactional processing (OLTP) platforms to handle transactional workloads and online analytical processing (OLAP) engines to perform analytics and reporting.
Today, a particular focus and interest of operational analytics includes streaming data ingest and analysis in real time. Some refer to operational analytics as hybrid transaction/analytical processing (HTAP), translytical, or hybrid operational analytic processing (HOAP). We’ll address if this model is a way to create efficiencies in our environments.
This talk provides an architecture overview of data-centric microservices illustrated with an example application. The following Microservices concepts are illustrated - domain driven design, event-driven services, Saga transactions, Application tracing and Health monitoring with different microservices using a variety of data types supported in the database - business data, documents, spatial, graph, and events. A running example of a mobile food delivery application (called GrubDash) is used, with a hands-on-lab that is available for attendees to work through on the Oracle Cloud after these sessions. The rest of the talks will build upon this Microservices architecture framework.
SpringPeople - Introduction to Cloud ComputingSpringPeople
Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications.
This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc
Presentation: Overview of Kognitio, Kognitio Cloud and the Kognitio Analytical Platform
Kognitio is driving the convergence of Big Data, in-memory analytics and cloud computing. Having delivered the first in-memory analytical platform in 1989, it was designed from the ground up to provide the highest amount of scalable compute power to allow rapid execution of complex analytical queries without the administrative overhead of manipulating data. Kognitio software runs on industry-standard x86 servers, or as an appliance, or in Kognitio Cloud, a ready-to-use analytical platform. Kognitio Cloud is a secure, private or public cloud Platform-as-a-Service (PaaS), leveraging the cloud computing model to make the Kognitio Analytical Platform available on a subscription basis. Clients span industries, including market research, consumer packaged goods, retail, telecommunications, financial services, insurance, gaming, media and utilities.
To learn more, visit www.kognitio.com and follow us on Facebook, LinkedIn and Twitter.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...NoSQLmatters
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing
NoSQL is not just about different storage alternatives such as document store, key value store, graphs or column-based databases. The hardware is also getting much more important. Besides common disks and SSDs, enterprises begin to use in-memory storages more and more because a distributed in-memory data grid provides very fast data access and update. While its performance will vary depending on multiple factors, it is not uncommon to be 100 times faster than corresponding database implementations. For this reason and others described in this session, in-memory computing is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable, fault-tolerant applications.The session begins with a short introduction to in-memory computing. Afterwards, different frameworks and product alternatives are discussed for implementing in-memory solutions. Finally, the main part of this session shows several different real world uses cases where in-memory computing delivers business value by supercharging the infrastructure.
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Vladi Vexler
Continuous Availability and Scalability with ScaleBase Lite and ScaleBase
Abstract:
Business are driven by data and processes. Ensuring databases availability during unexpected outages, continuous operations during maintenance and webscale scalability – are keys for major positive impact on businesses.
ScaleBase and ScaleBase Lite distributed database management systems ensure business continuity during unexpected and expected outages with automated failover and failback capabilities, enabling five-nines of availability (99.999%). Additional functionalities, such as load balancing and data distribution further increase performance and throughput capacity for more users and more data management.
This webinar will review and discuss:
1. The lifecycle and the challenges of webscale databases
2. Availability challenges in public, private and hybrid clouds
3. Introduction to ScaleBase Lite – instant and transparent MySQL Scale-out by intelligent load balancing (read/write splitting) and continuous availability
4. Scale further with ScaleBase – Massive scale out to distributed database containing 10s and 100s of servers
(Webinar Dec 17 2014)
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
Businesses are generating and ingesting an unprecedented volume of structured and unstructured data to be analyzed. Needed is a scalable Big Data infrastructure that processes and parses extremely high volume in real-time and calculates aggregations and statistics. Banking trade data where volumes can exceed billions of messages a day is a perfect example.
Firms are fast approaching 'the wall' in terms of scalability with relational databases, and must stop imposing relational structure on analytics data and map raw trade data to a data model in low latency, preserve the mapped data to disk, and handle ad-hoc data requests for data analytics.
Joe discusses and introduces NoSQL databases, describing how they are capable of scaling far beyond relational databases while maintaining performance , and shares a real-world case study that details the architecture and technologies needed to ingest high-volume data for real-time analytics.
For more information, visit www.casertaconcepts.com
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
PayPal Data Lake Journey | 2017-Oct | San Diego | Teradata Edge of Next
Gimel [http://www.gimel.io] is a Big Data Processing Library, open sourced by PayPal.
https://www.youtube.com/watch?v=52PdNno_9cU&t=3s
Gimel empowers analysts, scientists, data engineers alike to access a variety of Big Data / Traditional Data Stores - with just SQL or a single line of code (Unified Data API).
This is possible via the Catalog of Technical properties abstracted from users, along with a rich collection of Data Store Connectors available in Gimel Library.
A Catalog provider can be Hive or User Supplied (runtime) or UDC.
In addition, PayPal recently open sourced UDC [Unified Data Catalog], which can host and serve the Technical Metatada of the Data Stores & Objects. Visit http://www.unifieddatacatalog.io to experience first hand.
Similar to Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions" (20)
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
The challenges of increasing complexity of organizations, companies and projects are obvious and omnipresent. Everywhere there are connections and dependencies that are often not adequately managed or not considered at all because of a lack of technology or expertise to uncover and leverage the relationships in data and information. In his presentation, Axel Morgner talks about graph technology and knowledge graphs as indispensable building blocks for successful companies.
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
Every day we are challenged with more data, more use cases and an ever increasing demand for analytics. In this talk Bjorn will explain how autonomous data management and machine learning help innovators to more productive and give examples how to deliver new data driven projects with less risk at lower costs.
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
Compliance departments within banks and other financial institutions are turning to machine learning for improving their Anti Money Laundering compliance activities. Today, the systems that aim to detect potentially suspicious activity are commonly rule-based, and suffer from ultra-high false positive rates. DataRobot will discuss how their Automated Machine Learning platform was successfully used for a real use case to reduce their false positives and to enhance their Anti-Money Laundering activities.
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
Trump, Brexit, Cambridge Analytica... In the last few years, we have had to confront the consequences of the use and misuse of data science algorithms in manipulating public opinion through social media. The use of private data to microtarget individuals is a daily practice (and a trillion-dollar industry), which has serious side-effects when the selling product is your political ideology. How can we cope with this new scenario?
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
When taking a deep dive into the world of data, one thing is certain: the ultimate goal is to create something new, something better, something faster. In other words, innovation should always be at the forefront of companies strategic outlook, whether their goal is to pioneer new processes, user experiences, products or services.
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
What does it take to build a good data product or service? Data practitioners always think about the technology, user experience and commercial viability. But rarely do they think about the implications of the systems they build. This talk will shed light on the impact of AI systems and the unintended consequences of the use of data in different products. It will also discuss our role, as data practitioners, in planting the seeds of fairness in the systems we build.
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
We all hear about the power of data, big data and data analysis in todays market place. But rarely feel it's touchable effects on our own business decisions and performance.
Let's dive into it and see how can people analytics increase people performance, motivation and business revenue?
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
In the data industry, having correctly labelled datasets is vital. Timothy Thatcher explains how tagging your data while considering time and location and complex hierarchical rules at scale can be handled.
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
During the lifetime of an A/B test product managers and analysts in GetYourGuide require various tools and different kinds of data to plan the trial properly, control it during the run and analyze the results at the end. This talk would be about the architecture, tools and data flow for serving their needs.
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
Creativity is the mental ability to create new ideas and designs. Innovation, on the other hand, Means developing useful solutions from new ideas. Creativity can be goal-oriented, Whereas innovation is always goal-oriented. This bedeutet, dass innovation aims to achieve defined goals. The use of cloud services and technologies promises enterprise users many benefits in terms of more flexible use of IT resources and faster access to innovative solutions. That’s why we want to examine the question in this talk, of what role cloud computing plays for innovation in companies.
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
Presentation of Time Series Properties of Financial Instrument and Possibilities in Frequency Decomposition and Information Extraction using FT, STFT and Wavelets with Outlook in Current Research on Wavelet Neural Networks
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
"With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data for ETL, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The amount of the data also makes it hard to incrementally test and retrain models in near real-time.
Learn how Apache Ignite and GridGain help to address limitations like ETL costs, scaling issues and Time-To-Market for the new models and help achieve near-real-time, continuous learning.
Yuriy Babak, the head of ML/DL framework development at GridGain and Apache Ignite committer, will explain how ML/DL work with Apache Ignite, and how to get started.
Topics include:
— Overview of distributed ML/DL including architecture, implementation, usage patterns, pros and cons
— Overview of Apache Ignite ML/DL, including built-in ML/DL algorithms, and how to implement your own
— Model inference with Apache Ignite, including how to train models with other libraries, like Apache Spark, and deploy them in Ignite
— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and inference"
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
"Machine learning algorithms require significant amounts of training data which has been centralized on one machine or in a datacenter so far. For numerous applications, such need of collecting data can be extremely privacy-invasive. Recent advancements in AI research approach this issue by a new paradigm of training AI models, i.e., Federated Learning.
In federated learning, edge devices (phones, computers, cars etc.) collaboratively learn a shared AI model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. From personal data perspective, this paradigm enables a way of training a model on the device without directly inspecting users’ data on a server. This talk will pinpoint several examples of AI applications benefiting from federated learning and the likely future of privacy-aware systems."
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting primitives for graph : SHORT REPORT / NOTES
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applications for Investment Banks & Financial Institutions"
1. Real Time Big Data Applications for
Investment Banks & Financial Institutions
2. Dev Lakhani
• 15 years Software Architecture & Development Experience
• 7 Years of Big Data Experience
• Big Data Architectures for Banks, Telecom, Retail, Media
• Deutche Telekom
• ASOS
• Tier 1 Investment Banks in Canary Wharf
• Dentsu Aeigis
• Contributor to Hadoop, Spark, Tachyon, HBase, Ignite
• uk.linkedin.com/in/devlakhani
3. • Overview of Big Data in financial
institutions
• Architectural constraints in investment
banking
• Implementation challenges
• Data model
• Future for financial applications
Introduction
4. • This talk has a technical focus
• This presentation is not representative of any client
• Real time re-definition for Big Data
• Vendor neutral talk
Disclaimers
5. Real Time Definition
[AS MODIFIER] Computing Relating to a system in which input data is
processed within milliseconds so that it is available virtually immediately as
feedback to the process from which it is coming, e.g. in a missile guidance
system:real-time signal processingreal-time software
http://www.oxforddictionaries.com/definition/english/real-time
6. Real Time Definition (Modified)
[AS MODIFIER] Computing Relating to a system in which input data is
processed within a guaranteed response time, using up-to-date
(latest version) information and available on demand as feedback to
the process from which it is coming.
8. Big Data Drivers for Investment Banking &
Financial Instituions
• Capturing billions of trades
• Quantifying risk and exposure
• Regulatory requirements
• Response to news and events
• Detect fraud, rogue trading and anomalies
• Performing simulations & algorithmic trading
• Business analysis -PNL
• Capital reserves and forecasting
Why Use Big Data?
10. • Disaster avoidance (not recovery) through
replication and redundancy
• High availability
• "Chinese Wall" policy and segmentation of
information
• Within the bank
• External to the bank
• Security & role based segmentation
• Responsiveness and throughput
• API or service based architecutre,
transparent to quants/end users
• Data completeness, 1 lost trade = $1 < x <
$10million in VaR estimate
Constraints
11. •Distributed File System, ingest raw data
•Regulatory compliance& archiving
• Last option disaster recovery
• Direct access to "power-users" for modelling and
analysis
Big Data Solution Architecture Components
12. •Distributed Warehouse
•Not always highly transactional
• Trading exchange worries about the
trade/transaction
• Eventually consistent sufficient
•SQL vs No-SQL
•MPP (Massively Parallel Processing)
•In memory vs on disk tuning
Big Data Solution Architecture Components
13. •Analytics and Serving Layer
•Perform descriptive stats
• Trade summaries
• Risk Calculation
• Monte Carlo Simulation
• Machine learning
• Expose APIs
•Report/Aggregate/Present
Big Data Solution Architecture Components
14. Physical Processes and Daemons
• HDFS
• Datanodes- store the data
• Journalnodes - shared edits (HA)
• Primary and Seconday namenode (HA)
• Zookeeper - coordinate between Namenodes
• YARN
• Resource manager x 2
• Node managers x (number of nodes)
• Job history servers
Lower Level Architecture Components
15. Physical Processes and Daemons
• HBase (1.0.0)
• N xHBase zookeepers
• 2 x Hbase masters
• 2 x Hbase master -regionservers
• N x Regionservers
• Spark
• Master (No HA)
• N x slaves
• Monitoring
• JMX monitoring
Lower Level Architecture Components
17. • Estimate Value at Risk
• Over a given timeframe, week, month,
year
• A confidence level 95%-99%
• A loss amount e.g £1m
What is the maximum potential
loss >£1m over that time?
• Using Spark calculate the covariance
matrix of past returns
• Use RDDs and parallel data structures to
simulate various conditions
• Sum, aggregate and take bottom 5%
Analytics, Machine Learning & Simulation
19. • Keys have to be distributed evenly
• Encoding and compression choices have to be
made
• LZO, GZ, Snappy, Codecs
• Serialization choices and memory tuning
• Java objects/JSON objects/JSON to Java
• Replication has to be managed and tested
• Cross cluster replication
• Cross data center replication
• Availability throughput during replication
• Rolling restarts and upgrades
Performance Challenges
20. • In memory tuning, off heap and on heap, region sizes
• Java tuning, heap, permgen, generation (for 20+ daemons!)
• HBase requires a functioning and performant HDFS cluster
• Cassandra requires tuning for compaction, replication
• Spark needs correct partitioning and persistence strategies
• Allocation of resources to nodes, network, disk etc.
• Role and table based segmentation - maintaining the Chinese
Wall
Performance Challenges
21. Once you solve that...
•Distributed File System for ingested/archived
data
•MPP warehouse for querying and analytics
•Quant layer for machine learning and prediction
•Service layer to expose APIs for VaR, stress tests
•Response guarantees for real time Big Data