This document introduces BigSheets, a data discovery tool that is part of IBM's BigInsights Hadoop product. BigSheets allows users to explore large datasets without writing code. It provides visualizations and works similarly to a spreadsheet. The document demonstrates how BigSheets can gather data from various sources, import structured and unstructured data, and provide interactive visualizations to analyze trends. BigSheets helps users get started with improving their business using big data.
Calpont CTO Jim Tommaney provides an overview InfiniDB 3, Calpont’s analytic data platform.
Discussion Topics
•How InfiniDB is architected for Big Data analytics
•How InfiniDB is provisioned for Amazon EC2 with an AMI
•How to quickly create a small or large cluster
•How InfiniDB’s parallel load capabilities deliver linear load scaling
Video: http://www.youtube.com/watch?v=BT8WvQMMaV0
Hadoop is the technology of choice for processing large data sets. At salesforce.com, we service internal and product big data use cases using a combination of Hadoop, Java MapReduce, Pig, Force.com, and machine learning algorithms. In this webinar, we will discuss an internal use case and a product use case:
Product Metrics: Internally, we measure feature usage using a combination of Hadoop, Pig, and the Force.com platform (Custom Objects and Analytics).
Community-Based Recommendations: In Chatter, our most successful people and file recommendations are built on a collaborative filtering algorithm that is implemented on Hadoop using Java MapReduce.
Calpont CTO Jim Tommaney provides an overview InfiniDB 3, Calpont’s analytic data platform.
Discussion Topics
•How InfiniDB is architected for Big Data analytics
•How InfiniDB is provisioned for Amazon EC2 with an AMI
•How to quickly create a small or large cluster
•How InfiniDB’s parallel load capabilities deliver linear load scaling
Video: http://www.youtube.com/watch?v=BT8WvQMMaV0
Hadoop is the technology of choice for processing large data sets. At salesforce.com, we service internal and product big data use cases using a combination of Hadoop, Java MapReduce, Pig, Force.com, and machine learning algorithms. In this webinar, we will discuss an internal use case and a product use case:
Product Metrics: Internally, we measure feature usage using a combination of Hadoop, Pig, and the Force.com platform (Custom Objects and Analytics).
Community-Based Recommendations: In Chatter, our most successful people and file recommendations are built on a collaborative filtering algorithm that is implemented on Hadoop using Java MapReduce.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Hadoop as Data Refinery - Steve LoughranJAX London
Apache Hadoop is often described as a "Big Data Platform" but what does that mean? One way to better understand Hadoop is to talk about how Hadoop is used. This talk discusses using Hadoop as a "Data Refinery", which is a common use case. The concept is very much like a traditional oil refinery except with data, pulling in large quantities of "crude data" over pipelines, refining some into useful business intelligence; refining other pieces into slightly less crude data that stays in the cluster until needed later. This metaphor proves useful when considering how Hadoop could be adopted in an organisation that already has data warehousing and business intelligence systems -and when contemplating how to hook up a Hadoop cluster to the sources of data inside and outside that organisation. A key point to remember is that storing data in Hadoop is not a means to an end any more than storing data in a database is: it is extracting information from that data. Using Hadoop as a front end "data refinery" means that it can integrate with existing Business Intelligence systems, while providing the platform for new applications.
Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters which have different tradeoffs from a cloud environment and making them run well in the cloud presents some challenges. In this talk, we describe how we've extended Hadoop and Hive to exploit these new tradeoffs and offer them as part of the Qubole Data Service (QDS). We will also present use-cases that show how QDS is making it extremely easy for an end user to use these technologies in the cloud.
Speaker: Ashish Thusoo, CEO, Qubole
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Introduction to Hortonworks Data Platform for WindowsHortonworks
According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.
In less than an hour, you’ll learn:
-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today
This presentation describes the Query Compiler of Hive for MapReduce. The architecture of the Hive Query Compiler is explained. Additionally, the compilation of a SQL-query to a MapReduce-Job is shown.
This presentation was created with the a presentation of Takeshi Nakano.
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
In a recent Big Data Warehousing Meetup in NYC, Caserta Concepts partnered with Datameer to explore big data analytics techniques. In the presentation, we made a Hive vs. Pig Comparison. For more information on our services or this presentation, please visit www.casertaconcepts.com or contact us at info (at) casertaconcepts.com.
http://www.casertaconcepts.com
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
A look at common patterns being applied to leverage Hadoop with traditional data management systems and the emerging landscape of tools which provide access and analysis of Hadoop data with existing systems such as data warehouses, relational databases, and business intelligence tools.
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
Introduces BigSheets, a spreadsheet-style tool for business users working with Big Data. BigSheets is part of IBM's InfoSphere BigInsights platform, which is based on open source technologies (e.g., Apache Hadoop) and IBM-specific technologies.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Hadoop as Data Refinery - Steve LoughranJAX London
Apache Hadoop is often described as a "Big Data Platform" but what does that mean? One way to better understand Hadoop is to talk about how Hadoop is used. This talk discusses using Hadoop as a "Data Refinery", which is a common use case. The concept is very much like a traditional oil refinery except with data, pulling in large quantities of "crude data" over pipelines, refining some into useful business intelligence; refining other pieces into slightly less crude data that stays in the cluster until needed later. This metaphor proves useful when considering how Hadoop could be adopted in an organisation that already has data warehousing and business intelligence systems -and when contemplating how to hook up a Hadoop cluster to the sources of data inside and outside that organisation. A key point to remember is that storing data in Hadoop is not a means to an end any more than storing data in a database is: it is extracting information from that data. Using Hadoop as a front end "data refinery" means that it can integrate with existing Business Intelligence systems, while providing the platform for new applications.
Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters which have different tradeoffs from a cloud environment and making them run well in the cloud presents some challenges. In this talk, we describe how we've extended Hadoop and Hive to exploit these new tradeoffs and offer them as part of the Qubole Data Service (QDS). We will also present use-cases that show how QDS is making it extremely easy for an end user to use these technologies in the cloud.
Speaker: Ashish Thusoo, CEO, Qubole
Neustar is a fast growing provider of enterprise services in telecommunications, online advertising, Internet infrastructure, and advanced technology. Neustar has engaged Think Big Analytics to leverage Hadoop to expand their data analysis capacity. This session describes how Hadoop has expanded their data warehouse capacity, agility for data analysis, reduced costs, and enabled new data products. We look at the challenges and opportunities in capturing 100′s of TB’s of compact binary network data, ad hoc analysis, integration with a scale out relational database, more agile data development, and building new products integrating multiple big data sets.
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
Introduction to Hortonworks Data Platform for WindowsHortonworks
According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.
In less than an hour, you’ll learn:
-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today
This presentation describes the Query Compiler of Hive for MapReduce. The architecture of the Hive Query Compiler is explained. Additionally, the compilation of a SQL-query to a MapReduce-Job is shown.
This presentation was created with the a presentation of Takeshi Nakano.
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
In a recent Big Data Warehousing Meetup in NYC, Caserta Concepts partnered with Datameer to explore big data analytics techniques. In the presentation, we made a Hive vs. Pig Comparison. For more information on our services or this presentation, please visit www.casertaconcepts.com or contact us at info (at) casertaconcepts.com.
http://www.casertaconcepts.com
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
A look at common patterns being applied to leverage Hadoop with traditional data management systems and the emerging landscape of tools which provide access and analysis of Hadoop data with existing systems such as data warehouses, relational databases, and business intelligence tools.
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
Introduces BigSheets, a spreadsheet-style tool for business users working with Big Data. BigSheets is part of IBM's InfoSphere BigInsights platform, which is based on open source technologies (e.g., Apache Hadoop) and IBM-specific technologies.
Webinar: Open Source Business Intelligence IntroSpagoWorld
The presentation supported the webinar delivered by Stefano Scamuzzo, SpagoBI International Manager, on 22nd December 2010 within SpagoWorld Webinar Center. http://www.spagoworld.org/
In-depth look at Dynamic Cubes capability plus demos. View the webinar video recording and download this deck: http://www.senturus.com/resources/dynamic-cubesin-cognos-10-2-jan/.
Senturus takes an in-depth look into the new Dynamic Cubes capability that is available with IBM Cognos 10.2. We break down the technology that enables Dynamic Cubes and help you understand what this new capability means for your organization. Demonstrations include IBM Cognos 10.2 Cube Designer, deployment of and reporting off of Dynamic Cubes; and Dynamic Query Analyzer and Aggregate Advisor.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
Before moving to hadoop, one must understand why we need hadoop, irrespective we have all sorts of RDBMS available in the market. This presentation has good understanding of big data and this enables you to properly analyze the use case for big data problems.
All Grown Up: Maturation of Analytics in the CloudInside Analysis
The Briefing Room with Wayne Eckerson and Birst
Live Webcast on Nov. 6, 2012
The desire for analytics today extends far beyond the traditional domain of Business Intelligence. The challenge is that operational systems come in countless shapes and sizes. Furthermore, each application treats data somewhat differently. But there are patterns of data flow and transformation that pervade all such systems. And there's one big place where all these data types and use cases have come together architecturally: the Cloud.
Watch this episode of the Briefing Room to hear veteran Analyst Wayne Eckerson explain how Cloud computing is ushering in a new era of analytics and intelligence. He'll be briefed by Brad Peters of Birst who will tout his company's purpose-built analytics platform. He'll discuss how the Birst engine processes and delivers raw data from disparate systems, offering the deployment flexibility of Software-as-a-Service, together with the capabilities of enterprise-class BI.
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Similar to Hadoop Summit Japan 2011 Fall - LT by IBM (20)
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
1. Data Discovery Tool
BigSheets
MapReduce with No Coding?
p g
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Big Data Tiger Team
IBM Software
IBM Software
2. Looking at Data
Looking at Data
• What would you do with Big data?
h ld d ih i d ?
• How to make use of it?
• It is difficult! – too vague.
• No specific problem that needs to be solved.
p p
• No specific question that needs to be answered.
• Only you know is to improve the business.
yy p
• But you have *data*
• So what would you do first?
So, what would you do first?
Looking at Data!
g
3. IBM with Hadoop
IBM with Hadoop
• IBM has been working with Open source
y g
community for the long time.
– Eclipse, Hadoop and so on …
• BigInsights include Hadoop
4. BigInsights
• BigInsihgts i
i ih is IBM Hadoop product for Big data
d d f i d
analytics.
– Basic Edition (up to 10TB) – Free 無償で使えます!
– Enterprise Edition
p
• Next version BigInsights ‐ coming soon
Next version BigInsights coming soon.
– v1.2 available.
• And many more
5. BigInsights Componetns
BigInsights Componetns
• BigInsihgts i l d
i ih includes:
– IBM Java
– JAQL - IBMが開発した言語(オープンソース)
– IBM Distribution of Hadoop
– BigSheets - データ探索ツール
– FLEX scheduler for Adaptive MapReduce
– Orchestrator (Workflow Engine)
– SystemT (Text Analytics), SystemML (Machine Learning)
– LDAP
– Web Console / Developer Studio
6. BigInsights – Basic Edition
BigInsights – Basic Edition
Version
Will be Update Basic Enterprise
Function in Nov Edition
Editi Edition
Editi
release.
Integrated Install Inc Inc
Open Source components:
Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc Inc
Jaql (programming / query language) 0.5.2 Inc Inc
Pig (programming / query language) 0.7 Inc Inc
Flume (data collection/aggregation) 0.9.1 Inc Inc
Hive (data summarization/querying) 0.5 Inc Inc
Lucene (text search) 3.0.2
302 Inc Inc
Zookeeper (process coordination) 3.2.2 Inc Inc
Avro (data serialization) 1.3.0 Inc Inc
HBase (
(real time read/write)
/ ) 0.20.6
0 20 6 Inc Inc
Oozie (workflow/ job orchestration) 2.2.2 Inc Inc
Online documentation Inc Inc
Capability to integrate with DB2, InfoSphere Warehouse Inc Inc
Two DB2 UDFs to submit jobs, and read results from BigInsights
7. BigInsights – Enterprise Edition
Enterprise Edition
Basic Enterprise
Function Edition Edition
R Connector
Jaql module to invoke R statistical capabilities from BigInsights n/a Inc
Netezza C
N t Connector
t
Jaql modules to read/write data from/to Netezza n/a Inc
LDAP n/a Inc
Web Console n/a Inc
Workflow Engine n/a Inc
Scheduler (Orchestrator) n/a Inc
Text Analytics Module (System T) n/a Inc
Eclipse support (for System T)* n/a Inc
BigSheets – Data Discovery Tool n/a Inc
IBM Optim Development Studio V2.2.1.0 n/a Inc
Support by IBM
pp y n/a Inc
9. BigSheets Concept Model
Concept Model
Enrich Inspect
Explore
Internet No Coding is Required!
Gather
BigSheets
Intranet
Publish Get/
Manipulate
Logs Gather
Massive Results
Other in BigInsights
Explore &
Analyze
13. Internet
BigSheets
Intranet
Gather Logs
Other
BigInsight
s
• BigInsights can gather data from
i i h h d f
– Predefined formats :
• BigSheets data reader
• Basic crawler data reader
• Basic crawler data reader (binary support)
Basic crawler data reader (binary support)
• Character‐delimited data reader
• Tab Separated Value (TSV) data reader
p ( )
• JavaScript Object Notation (JSON) array reader
• Comma Separated Value (CSV) data reader
– Customer BigSheets Reader
14. Internet
BigSheets
Intranet
Gather Logs
Other
BigInsight
s
• BigInsights can import structured and
i i h i d d
unstructured data
– CSV
– Files
– Network
• http
p
• hdfs
• AWS (S3n/S3)
– Other
• Customer Importer
15. Internet
BigSheets
Intranet
Collection Logs
Other
BigInsight
s
A complete list of MacDonald s in North America.
A complete list of MacDonald's in North America
16. Internet
BigSheets
Intranet
Logs
BigInsight
Other s
Calculate
Reformat
Import
A complete list of MacDonald's in North America.
17. Internet
BigSheets
Intranet
Logs
BigInsight
Other s
Column chart
Heat map