This document provides an overview of big data concepts from the perspective of an enterprise data architect. It discusses the data journey from acquisition to analytics and highlights best practices for data quality, sandboxing data on the cloud, balancing real-time and batch analytics, and putting these concepts together under a tiered data architecture. The architecture proposes investing more in fast data sources and less in cold data, using data virtualization to provide a unified view, and keeping sensitive customer data localized to comply with regulations.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
With Enterprise data growing rapidly year over year, traditional analytics approaches have proven to be expensive and unyielding. The result is that a growing proportion of our data is unused “dark data”. How can we create the basis for a data driven organization? Enter the "perfect storm" of cloud data analytics tools and approaches.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
With Enterprise data growing rapidly year over year, traditional analytics approaches have proven to be expensive and unyielding. The result is that a growing proportion of our data is unused “dark data”. How can we create the basis for a data driven organization? Enter the "perfect storm" of cloud data analytics tools and approaches.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
Risk comes in a variety of forms including uncertainty in financial markets, legal liabilities, operational risk, fraud, and protection against external and internal attacks. Models are becoming increasingly granular and improving risk modeling is a high priority.
Review this presentation from Splunk and MapR to learn how you can study months’ or years’ worth of raw data from disparate sources, without sampling, to understand and reduce risk.
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
With the emergence of regulations such as the General Data Protection Regulation from the European Union (effective May 2018), with fines up to 20m Euro, Data Lakes are emerging as the data architecture of choice amongst financial institutions. Banks are embarking on a journey to enable data scientists to unlock the value of the data silo'ed in many disparate data systems. By enabling self service data access and merging multiple streams of data by using data clustering, entity extraction, identity resolution and other techniques - we will show how banks have used Analytics to uncover business value without falling into the abyss of data swamps. The build out of the data lake requires the ingestion of data from multiple operational systems . By leveraging an automated Data Cataloging service, organizations are able to search, profile, discover, tag, track lineage and capture tribal knowledge delivered on the FICO Analytics Cloud enabling the data scientists to build innovative models, make automated decisions, track fraudulent usage, make intelligent marketing campaigns and improve the top line and bottom line for the financial institution.
Speaker:
Rohit Valia, Product Management and Strategy, Fico
"Big Data Use Cases" was presented to Lansing BigData and Hadoop Users Group Kickoff meeting on 2/24/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of CDH 5.3, HDP 2.2 and AWS cloud
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Dataconomy Media
Anne-Sophie Roessler, International Business Developer at Dataiku presented "3 ways to Fail your Data Lab Implementation" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
Mark Lewis, Senior MArketing Director EMEA, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
Philip Howard, industry analyst and database technology expert from Bloor Research International will present recent market research results and discuss the best and latest solutions as well as provide advice for identifying the right match for specific use cases. DataStax will also share case studies where DSE Graph technology is being applied to transform the customer experience in industry sectors such as Financial Services, Retail, Telecommunications, Logistics, Media and Entertainment. Attend this webinar to find out more about graph database technology, all the choices on the market today and how you can transform your own technical solutions and customer experience.
Webinar recording: https://youtu.be/s0Hozx_bdZ4
For current and on-demand DataStax webinars, visit: http://www.datastax.com/resources/webinars
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
Datavail and SlamData present on how to use NoSQL technologies (MongoDB and SlamData) to build a Data Hub -- the fast and easy way to real-time business insight.
Choosing data warehouse considerationsAseem Bansal
We recently chose a data warehouse after doing a basic POC of some data warehouses - AWS Redshift, AWS Athena, Snowflake, Google BigQuery. In this slide I share what were some considerations unique to our business due to which we ended up choosing Snowflake and what were the pros and cons of the various warehouses.
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
Risk comes in a variety of forms including uncertainty in financial markets, legal liabilities, operational risk, fraud, and protection against external and internal attacks. Models are becoming increasingly granular and improving risk modeling is a high priority.
Review this presentation from Splunk and MapR to learn how you can study months’ or years’ worth of raw data from disparate sources, without sampling, to understand and reduce risk.
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
With the emergence of regulations such as the General Data Protection Regulation from the European Union (effective May 2018), with fines up to 20m Euro, Data Lakes are emerging as the data architecture of choice amongst financial institutions. Banks are embarking on a journey to enable data scientists to unlock the value of the data silo'ed in many disparate data systems. By enabling self service data access and merging multiple streams of data by using data clustering, entity extraction, identity resolution and other techniques - we will show how banks have used Analytics to uncover business value without falling into the abyss of data swamps. The build out of the data lake requires the ingestion of data from multiple operational systems . By leveraging an automated Data Cataloging service, organizations are able to search, profile, discover, tag, track lineage and capture tribal knowledge delivered on the FICO Analytics Cloud enabling the data scientists to build innovative models, make automated decisions, track fraudulent usage, make intelligent marketing campaigns and improve the top line and bottom line for the financial institution.
Speaker:
Rohit Valia, Product Management and Strategy, Fico
"Big Data Use Cases" was presented to Lansing BigData and Hadoop Users Group Kickoff meeting on 2/24/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of CDH 5.3, HDP 2.2 and AWS cloud
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Dataconomy Media
Anne-Sophie Roessler, International Business Developer at Dataiku presented "3 ways to Fail your Data Lab Implementation" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
Mark Lewis, Senior MArketing Director EMEA, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
Philip Howard, industry analyst and database technology expert from Bloor Research International will present recent market research results and discuss the best and latest solutions as well as provide advice for identifying the right match for specific use cases. DataStax will also share case studies where DSE Graph technology is being applied to transform the customer experience in industry sectors such as Financial Services, Retail, Telecommunications, Logistics, Media and Entertainment. Attend this webinar to find out more about graph database technology, all the choices on the market today and how you can transform your own technical solutions and customer experience.
Webinar recording: https://youtu.be/s0Hozx_bdZ4
For current and on-demand DataStax webinars, visit: http://www.datastax.com/resources/webinars
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
Datavail and SlamData present on how to use NoSQL technologies (MongoDB and SlamData) to build a Data Hub -- the fast and easy way to real-time business insight.
Choosing data warehouse considerationsAseem Bansal
We recently chose a data warehouse after doing a basic POC of some data warehouses - AWS Redshift, AWS Athena, Snowflake, Google BigQuery. In this slide I share what were some considerations unique to our business due to which we ended up choosing Snowflake and what were the pros and cons of the various warehouses.
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
Using real time big data analytics for competitive advantageAmazon Web Services
Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.
Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.
To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.
BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Event-first thinking and streaming help organizations transition from followers to leaders in the market. A reliable, scalable, and economical streaming architecture helps them get there.
This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures, where it aims to make event-driven real-time applications viable for organizations.
The second half of the talk explores the MSS in detail, including its core components, their purposes, and how Kappa architecture has influenced it. Moreover, the talk lays out a few considerations before planning a new streaming application within an organization. The talk concludes by discussing the challenges in the streaming world and how vendors are trying to overcome them in the future.
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
Big data doesn't mean big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What's more, choosing the correct technology for your use case will almost certainly increase your top line as well.
Big words, right? We'll back them up with customer case studies and lots of details.
This webinar will give you the basics for growing your business in a profitable way. What's the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We'll take you through the specific use cases and business models that are the best fit for NoSQL solutions.
By the way, no prior knowledge is required. If you don't even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers' needs in today's data environment.
Assessing New Databases– Translytical Use CasesDATAVERSITY
Organizations run their day-in-and-day-out businesses with transactional applications and databases. On the other hand, organizations glean insights and make critical decisions using analytical databases and business intelligence tools.
The transactional workloads are relegated to database engines designed and tuned for transactional high throughput. Meanwhile, the big data generated by all the transactions require analytics platforms to load, store, and analyze volumes of data at high speed, providing timely insights to businesses.
Thus, in conventional information architectures, this requires two different database architectures and platforms: online transactional processing (OLTP) platforms to handle transactional workloads and online analytical processing (OLAP) engines to perform analytics and reporting.
Today, a particular focus and interest of operational analytics includes streaming data ingest and analysis in real time. Some refer to operational analytics as hybrid transaction/analytical processing (HTAP), translytical, or hybrid operational analytic processing (HOAP). We’ll address if this model is a way to create efficiencies in our environments.
We’re in the midst of an exciting paradigm shift in terms of how we process events data in real time to better react to business opportunities or risk. To stay ahead of your competition, you need the ability to react to business-critical events as they happen. These critical events are created through diverse sources such as social interaction, machine sensors, or a customer transaction. How can you understand the meaning and context of these events that ultimately define your business?
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
1. Big Data from the
trenches
Advice from the FSI industry
By: Azrul MADISA
2. About me…
• VP – Enterprise Data
Architect @ Maybank
• Take care of Maybank’s
data world wide
• Nuts about data, analytics
and software dev.
• Very hands on, love to read
• Teach aikido to kids
3. Big Data landscape today
https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
4. Too many big data tech?
Wait … what?
I have to know ALL
that?
8. Example: credit scoring and loan origination
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Screens
Data staging
area
Data
warehouse
Score card
builder
Decisioning
Sandbox
Data
scientist
10. Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data
StagingApplication
Over-night
11. Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data Staging
Application
Over-night
Audit trail
Weekly
12. Acquisition with quality
• Non-human error
• Use PEWMA algorithm
https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
14. Creating a sandbox on the cloud
• Why cloud:
– Scale data discovery as needed
– Merging private with public data
– Less bureaucratic
• But…
– Customer data on the cloud is a no no
15. Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
16. Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
What if there is a way to mask numerical data
while keeping the statistical properties intact
Easier for the
regulators to
digest
17. Creating a sandbox on the cloud
• Random projection
• Usually used for dimension reduction
Original
data
(M x N)
Random
matrix
(N x N)
X =
Masked
data
(M x N)
19. Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Analytical
model
Monthly
20. Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Real time decisioning
Monthly
21. Fast real-time analytics
• So what is real time analytics:
User
Application
Real time decisioning analytics
Analytical
model
updated in
real time
22. Fast real-time analytics
• So what is real time analytics:
User
Application
Real time analytics and decisioning
Analytical
model
updated in
real time
Predictive
analytics
Batch
analytical
model
Real-time
analytical model
23. Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Location, user info
SMS campaign
24. Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Change behaviour
(E.g. buy
something else)
Learn new
behaviour
25. Fast real-time analytics : Real-time analytics in
action
Over time
Interest
in
concerts
Interest
in movies
Interest
in sports
29. Data architecture
• Some difficult questions around big data and analytics
– How can I invest in big data while managing cost?
– How can I “experiment” with big data while mitigating risks?
– How can I create a 360 view of data without boiling the ocean?
– How can I use oversea data without violation regulations?
30. Tiered data architecture
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
31. Tiered data architecture
Data
consumer
Data virtualization
SQL /
Rest /
SOAP /
MQ
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Official data model
32. Tiered data architecture
• Investment / level of support
Master data
Fast data
Hot data
Cold data
Investment
in CPU /
memory
Investment
in storage
Level 1
Level 1
Level 2
Level 3
Data virtualization Level 1
Level of
support
33. Tiered data architecture
• Invest where it matters
– Defer investment if needed
– Refocus investment without disrupting business
• Data virtualization
– Create a façade for data access
– Provide standard interface for data
– Single data model, single access, single quality checkpoint
• Allow ‘experimentation’
– E.g. cut-off point for hot / cold
• Oversea data access
– Data stays where they are, only aggregated data is transferred back
– More palatable to regulators
• 360 view
– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed
• Single place to check for data quality
34. That’s all folks…
• Linkedin:
– https://www.linkedin.com/in/azrul-madisa-6052419