The document provides an overview of IBM's big data and analytics capabilities. It discusses what big data is, the characteristics of big data including volume, velocity, variety and veracity. It then covers IBM's big data platform which includes products like InfoSphere Data Explorer, InfoSphere BigInsights, IBM PureData Systems and InfoSphere Streams. Example use cases of big data are also presented.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
This deck covers the new IBM Voice Gateway product which introduces a next generate IVR system that is conversational and built on IBM Watson cognitive services. It can also transcribe calls in real time to enable agent assist type applications. Use capabilities like the Watson Conversation service, Speech To Text and Text To Speech, the IBM Voice Gateway is built on cloud native principles.
IBM Watson overview presented by Mike Pointer, Watson Sr. Solution Architect, at Penn State's Nittany Watson Challenge Immersion event on January 19-20, 2017.
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
Silicon Valley Code Camp -- October 11, 2014.
Session: Getting started with Hadoop on the Cloud.
Hadoop and Cloud is an almost perfect marriage. Hadoop is a distributed computing framework that leverages a cluster built on commodity hardware. The Cloud simplifies provisioning of machines and software. Getting started with Hadoop on the Cloud makes it simple to provision your environment quickly and actually get started using Hadoop. IBM Bluemix has democratized Hadoop for the masses! This session will provide a brief introduction to what Hadoop is, how does cloud work and will then focus on how to get started via a series of demos. We will conclude with a discussion around the tutorials and public datasets - all of the tools needed to get you started quickly.
Learn more about BigInsights for Hadoop: https://developer.ibm.com/hadoop/
Building Confidence in Big Data - IBM Smarter Business 2013 IBM Sverige
Success with big data comes down to confidence. Without confidence in the underlying data, decision makers may not trust and act on analytic insight. You need confidence in your data – that it’s correct, trusted, and protected through automated integration, visual context, and agile governance. You need confidence in your ability to accelerate time to value, with fast deployments of big data appliances. Learn how clients have succeeded with big data by building confidence in their data, ability to deploy, and skills. Presenter: David Corrigan, Big Data specialist, IBM. Mer från dagen på http://bit.ly/sb13se
Open source Apache Hadoop is a great framework for distributed processing of large data sets. But there’s a difference between “playing” with big data versus solving real problems. The reality is that Hadoop alone is not enough. In fact, almost every organization that plans to use Hadoop for production use quickly discovers that it lacks the required features for enterprise use. And, fewer still have the Hadoop specialists on hand to navigate through the complexity to build reliable, robust applications. As a result, many Hadoop projects never make it to production as executives say, “we just don’t have the skills.” In this session, we will discuss these enterprise capabilities and why they’re important: analytics, visualization, security, enterprise integration, developer/admin tools, and more. Additionally, we will share several real-world client examples who have found it necessary to use an enterprise-grade Hadoop platform to tackle some of the most interesting and challenging business problems.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
8. IBM Big Data & Analytics
Integrate and Govern
all Data Sources
Integration, Data Quality,
Security, ILM, MDM
Leveraging Big Data Requires Multiple Platform Capabilities
8
Manage Streaming Data Stream Computing
Understand and Navigate
Federated Big Data Sources
Federated Discovery
and Navigation
Data WarehousingStructure and Control Data
Manage and Store Huge
Volume of any Data
Hadoop File System
MapReduce
Analyze Unstructured Data Text Analytics Engine
10. IBM Big Data & Analytics
• Financial and
tax
preparation
software and
services
• $4.15B rev
2012
A Big Data Journey:
Anticipating and Improving Customer Interactions
Project 1: Big Data Foundation
-Data Warehousing, Data Quality, Customer Data Hub
-Single view of the customer
Project 2: Analytics
-Customer behavior and segmentation analysis
-Reduced customer churn 10%
-$10M new revenue in 12months
Project 3: Unstructured Data Analytics
-Social media analysis, Log Analysis, Text Analytics
-Augment customer profiles with new data sources
-Data warehouse cost optimization
-Data Exploration
Project 4: Real Time Analytics
-No latency analytics
-Real time behavior prediction
-Real time customer segmentation
10
11. IBM Big Data & Analytics
Cloud | Mobile | Security
Gather, extract
and explore data
using best of
breed
visualization
Speed time to
value with analytic
and application
accelerators
IBM Big Data Platform
Systems
Management
Applications &
Development
Visualization
& Discovery
Analyze streaming
data and large
data bursts for
real-time insights
Govern data
quality and
manage
information
lifecycle
Cost-effectively
analyze
Petabytes of
structured and
unstructured
information
Deliver deep insight
with advanced
in-database
analytics and
operational analytics
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Discovery
Index and
federated
discovery for
contextual
collaborative
insights
Solutions
Analytics and Decision Management
Big Data Infrastructure
Big Data Platform and Application Frameworks
12. IBM Big Data & Analytics
ETL, MDM, Data Governance
Metadata and Governance Zone
12
Warehousing Zone
Enterprise
Warehouse
Data Marts
An example of the big data platform in practice
Ingestion and Real-time Analytic Zone
Streams
Connectors
BI &
Reporting
Predictive
Analytics
Analytics and
Reporting Zone
Visualization
& Discovery
Landing and Analytics Sandbox Zone
Hive/HBase
Col Stores
Documents
in variety of formats
MapReduce
Hadoop
13. IBM Big Data & Analytics
TECHNOLOGY
Example: Integrate big data sources with
enterprise data
SPSS
Modeler
Cognos
RTM
Real-time
Analytics
Predictive
InfoSphere
BigInsights
Cognos
Insight
Cognos
BI
Export and
Explore
Social Media
Analysis
Reporting / Analysis
Dashboards
Cognos
Consumer
Insight
IBM Business Analytics
IBM Big Data Platform
PureData
Systems
Data In-Motion Data At-Rest
Other Sources
17. IBM Big Data & Analytics
BigInsights Enterprise Edition
Connectivity and Integration Streams
Netezza
Text
processing
engine and
library
JDBC
Flume
Infrastructure Jaql
Hive
Pig
HBase
MapReduce
HDFS
ZooKeeper
Indexing Lucene
Adaptive
MapReduce
Oozie
Text compression
Enhanced
security
Flexible
scheduler
Optional
IBM and
partner
offerings
Analytics and discovery “Apps”
DB2
BigSheets
Web Crawler
Distrib file copy
DB export
Boardreader
DB import
Ad hoc query
Machine
learning
Data
processing
. . .
Administrative and
development tools
Web console
• Monitor cluster health, jobs, etc.
• Add / remove nodes
• Start / stop services
• Inspect job status
• Inspect workflow status
• Deploy applications
• Launch apps / jobs
• Work with distrib file system
•Work with spreadsheet interface
•Support REST-based API
• . . .
R
Eclipse tools
• Text analytics
• MapReduce programming
• Jaql, Hive, Pig development
• BigSheets plug-in development
• Oozie workflow generation
Integrated
installer
Open Source IBMIBM
Cognos BI
GPFS (EAP)
Accelerator for
machine data
analysis
Accelerator for
social data
analysis
Guardium DataStageData Explorer
Sqoop
HCatalog
18. IBM Big Data & Analytics
Current fact finding
Analyze data in motion – before it is stored
Low latency paradigm, push model
Data driven – bring data to the analytics
Historical fact finding
Find and analyze information stored on disk
Batch paradigm, pull model
Query-driven: submits queries to static data
Traditional Computing Stream Computing
Stream Computing Represents a Paradigm Shift
Real-time
Analytics
1818
19. IBM Big Data & Analytics
Modify
Filter / Sample
Classify
Fuse
Annotate
Big Data in real-time with InfoSphere Streams
Score
Windowed
Aggregates
Analyze
20. IBM Big Data & Analytics
Mining in Microseconds
(included with Streams)
Image & Video
(Open Source)
Simple & Advanced Text
(included with Streams)
(IBM Research)
(Open Source UIMA)
Text
(listen, verb),
(radio, noun)
Acoustic
(IBM Research)
(Open Source)
Geospatial
(IBM Research)
Predictive
(IBM Research)
Advanced
Mathematical
Models
(IBM Research)
Statistics
(included with
Streams)
∑population
tt asR ),(
Analytic Accelerators Designed for Velocity (and Variety)
2020
21. IBM Big Data & Analytics
Putting it all together …end-to-end big data solution
Netezza
Appliance
InfoSphere
BigInsights
IBM Cognos
IBM SPSS
Streaming Data
Sources
Discover
Model
Visualize
& Publish
Score
Measure
InfoSphere
Streams
InfoSphere
Warehouse
2121
25. IBM Big Data & Analytics
Big SQL enables the Cognos BI
server to delegate many types of
analytical computations to
BigInsights MapReduce
processing instead of computing
them locally at a performance
cost like it would do with Hive
Faster response times due to
increased opportunity for query
processing to occur closer to the
data
Not hindered by the latency and
other limitations of querying
Hadoop via Hive
Application
(Map-Reduce)
Storage
(HBase, HDFS)
InfoSphere BigInsights
Cognos BI Server
Explore &
Analyze Report & Act
SQL
Interface
via JDBC
Hive
Cognos Business Intelligence optimized for Big SQL
26. IBM Big Data & Analytics
Of database queries
for reporting2
3838xx
Average
Acceleration
2. Based on internal tests.
Dynamic
Query
Compatible
Query
Dynamic
Cubes
Dynamic
Cubes
C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8
DB2 with BLU
Cognos BI
+
DB2 BLU
+
Power
Performance – Cognos BI + DB2 BLU
Dynamic
Query
Compatible
Query
Dynamic
Cubes
Dynamic
Cubes
Faster cube load*
Faster DB Query*
27. IBM Big Data & Analytics
For apps like E-commerce…
Database cluster services optimized for
transactional throughput and scalability
For apps like Customer Analysis…
Data warehouse services optimized for
high-speed, peta-scale analytics and simplicity
For apps like Real-time Fraud Detection…
Operational data warehouse services optimized to
balance high performance analytics and real-time
operational throughput
Meeting Big Data Challenges – Fast and Easy!
System for Transactions
System for Analytics
System for Operational Analytics
System for Hadoop
For Exploratory Analysis & Queryable Archive
Hadoop data services optimized for big data analytics
and online archive with appliance simplicity
IBM PureData Systems
29. IBM Big Data & Analytics
29
Every Industry can Leverage Big Data and Analytics.
Insurance
• 360˚˚˚˚ View of Domain
or Subject
• Catastrophe Modeling
• Fraud & Abuse
Banking
• Optimizing Offers and
Cross-sell
• Customer Service and
Call Center Efficiency
Telco
• Pro-active Call Center
• Network Analytics
• Location Based
Services
Energy &
Utilities
• Smart Meter Analytics
• Distribution Load
Forecasting/Scheduling
• Condition Based
Maintenance
Media &
Entertainment
• Business process
transformation
• Audience & Marketing
Optimization
Retail
• Actionable Customer
Insight
• Merchandise
Optimization
• Dynamic Pricing
Travel &
Transport
• Customer Analytics &
Loyalty Marketing
• Predictive Maintenance
Analytics
Consumer
Products
• Shelf Availability
• Promotional Spend
Optimization
• Merchandising
Compliance
Government
• Civilian Services
• Defense & Intelligence
• Tax & Treasury Services
Healthcare
• Measure & Act on
Population Health
Outcomes
• Engage Consumers in
their Healthcare
Automotive
• Advanced Condition
Monitoring
• Data Warehouse
Optimization
Life Sciences
• Increase visibility into
drug safety and
effectiveness
Chemical &
Petroleum
• Operational Surveillance,
Analysis & Optimization
• Data Warehouse
Consolidation, Integration
& Augmentation
Aerospace &
Defense
• Uniform Information
Access Platform
• Data Warehouse
Optimization
Electronics
• Customer/ Channel
Analytics
• Advanced Condition
Monitoring
31. IBM Big Data & Analytics
31
A Catalyst for ISV and Partner Innovation
Traditional Approach Transformational Outcomes
Customer segmentation based
on loyalty data
Historical analysis of
subscriber data
Managing rising cost of care
Capturing information from all
interactions to improve customer
lifetime value
Combining data from hundreds of
hospitals to improve results across
the healthcare continuum
2 million events analyzed
per minute, delivering real-time
insight to mobile operators
Use Big Data analytics to prioritize
and isolate areas of risk or rogue
activity
Anti-corruption and bribery
compliance program
Provide visibility, analysis and
reporting across the entire supply
chain (planning -> execution)
Measure and predict patient
payment behavior, reduce risk from
bad debt and boost collection rates
Analyzing parking systems to
maximize revenue & improve the
parking experience in cities
Treat-first, seek-payment-later
and write off bad debt
Manual supply chain
integration
Random parking meter patrols
& search for open spots
32. IBM Big Data & Analytics
Get started!
Identify and prioritize
business use cases
Identify and prioritize
business use cases
New insights and
new possibilities
New insights and
new possibilities
New revenue
opportunities
New revenue
opportunities
Process and performance
improvement
Process and performance
improvement
Evolve your existing
analytics capabilities
Evolve your existing
analytics capabilities
Build or acquire new
skills required
Build or acquire new
skills required
Measure and
communicate success
Measure and
communicate success
Ensure that the business
is engaged
Ensure that the business
is engaged
Agree on the key
measures for success
Agree on the key
measures for success
Think Big Pick your Spot
Execute and
Deliver Value