Operational BI caters to decision making challenges faced by organizations today by enabling exchange of information between decision makers and alerting the right people at the right time, transcending the organizational boundaries.
Database Architechs is a database-focused consulting company for 17 years bringing you the most skilled and experienced data and database experts with a wide variety of service offering covering all database and data related aspects.
Operational BI caters to decision making challenges faced by organizations today by enabling exchange of information between decision makers and alerting the right people at the right time, transcending the organizational boundaries.
Database Architechs is a database-focused consulting company for 17 years bringing you the most skilled and experienced data and database experts with a wide variety of service offering covering all database and data related aspects.
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...Mingxia Zhang, Ph.D.
Tuesday, February 7th, 5:30 - 5:50 PM
Using Frameworx in Implementing a Unified Service Management Tool –Improving Organizational Collaboration and Communication
Examining the drivers for developing a Unified Service Management Tool to improve business processes at the service level in the Strategy, Infrastructure, and Product (SIP) area as well as Operations.
Outlining the development of an enterprise-wide Service Management application, which enabled solidification of the Service Development and Management processes in the SIP area and Service Management and Operation processes
Quantifying the benefits in terms of information sharing, process unification/implementation, cost saving and revenue increasing in service management
Introduccion a SQL Server Master Data ServicesEduardo Castro
En esta presentación hacemos una introducción a SQL Server 2008 R2 Master Data Services.
Saludos,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
Matt Comstock, Vice President Business Intelligence Office, Razorfish, presents at the Big Analytics 2012 Roadshow.
From search to email to social, customers are interacting with your brand across a variety of channels. But what do people do once they view an advertisement or get an email? What common behaviors are displayed once they’re on your site? By combining media exposure/behavior, site-side media, and in-store purchase data, you can understand better the impact media has on driving value to your business. Come to this session to learn how better data-driven multi-channel analysis lets you see what consumers do before they become a customer to understand what content influences which segments of users by media audience. Discover new segmentation and targeting strategies to improve engagement with your brand and increase advertising lift. See how a leader in digital marketing uses a combination of technologies including Teradata Aster, Hadoop, and Amazon Web Services to handle big data and provide big analytics to improve business value.
Optimize Asset Value and Performance with Enterprise Content ManagementSAP Solution Extensions
The SAP Extended Enterprise Content Management application by OpenText bridges the gap between SAP structured data and unstructured content for asset management processes.
For capital-intensive industries, this contributes to improved productivity, reliability, compliance, and worker safety and reduces costs and environmental impact.
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
Matt Aslett, 451 Research, and Bob Wilkinson, VP Engineering for Calpont, discuss the emergence of the analytic platform, its place the new ecosystem for Big Data, considerations for selection, and applied use cases of Calpont’s analytic platform, InfiniDB, in Telco and Mobile Advertising.
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...Mingxia Zhang, Ph.D.
Tuesday, February 7th, 5:30 - 5:50 PM
Using Frameworx in Implementing a Unified Service Management Tool –Improving Organizational Collaboration and Communication
Examining the drivers for developing a Unified Service Management Tool to improve business processes at the service level in the Strategy, Infrastructure, and Product (SIP) area as well as Operations.
Outlining the development of an enterprise-wide Service Management application, which enabled solidification of the Service Development and Management processes in the SIP area and Service Management and Operation processes
Quantifying the benefits in terms of information sharing, process unification/implementation, cost saving and revenue increasing in service management
Introduccion a SQL Server Master Data ServicesEduardo Castro
En esta presentación hacemos una introducción a SQL Server 2008 R2 Master Data Services.
Saludos,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
Matt Comstock, Vice President Business Intelligence Office, Razorfish, presents at the Big Analytics 2012 Roadshow.
From search to email to social, customers are interacting with your brand across a variety of channels. But what do people do once they view an advertisement or get an email? What common behaviors are displayed once they’re on your site? By combining media exposure/behavior, site-side media, and in-store purchase data, you can understand better the impact media has on driving value to your business. Come to this session to learn how better data-driven multi-channel analysis lets you see what consumers do before they become a customer to understand what content influences which segments of users by media audience. Discover new segmentation and targeting strategies to improve engagement with your brand and increase advertising lift. See how a leader in digital marketing uses a combination of technologies including Teradata Aster, Hadoop, and Amazon Web Services to handle big data and provide big analytics to improve business value.
Optimize Asset Value and Performance with Enterprise Content ManagementSAP Solution Extensions
The SAP Extended Enterprise Content Management application by OpenText bridges the gap between SAP structured data and unstructured content for asset management processes.
For capital-intensive industries, this contributes to improved productivity, reliability, compliance, and worker safety and reduces costs and environmental impact.
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
Matt Aslett, 451 Research, and Bob Wilkinson, VP Engineering for Calpont, discuss the emergence of the analytic platform, its place the new ecosystem for Big Data, considerations for selection, and applied use cases of Calpont’s analytic platform, InfiniDB, in Telco and Mobile Advertising.
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...Cloudera, Inc.
Manoj Murumkar, Senior Manager, Data Engineering at CBS Interactive and Bala Venkatrao, Director of Products at Cloudera provide insight into how CBSi uses Cloudera Manager to effectively manage the complete lifecycle of CBSi’s Hadoop operations. They will highlight the Cloudera Manager features that have been most helpful to CBSi and share best practices on how to use the tool. This webinar will conclude with a preview of the future roadmap for Cloudera Manager.
AllAccessSAP 2012 Finale - SAP Slides (incl links)BI Brainz Group
The actual SAP slide deck from our webinar. Click here to gain full details about the event http://everythingxcelsius.com/businessobjects/recording-allaccesssap-xcelsius-sod-update-2013-sap-bi-roadmap-stats/5531
Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Many vendors have provided interfaces between SQL systems and Hadoop but have not been able to semantically integrate these technologies while Hive, Pig and SQL processing islands proliferate. This session will discuss how Teradata is working with Hortonworks to optimize the use of Hadoop within the Teradata Analytical Ecosystem to ingest, store, and refine new data types, as well as exciting new developments to bridge the gap between Hadoop and SQL to unlock deeper insights from data in Hadoop. The use of Teradata Aster as a tightly integrated SQL-MapReduce® Discovery Platform for Hadoop environments will also be discussed.
Innovation Webinar - Using IFS Applications BI to drive business excellenceIFS
Studies show that best-in-class businesses—those with the best operating margins and turnover growth in their industries—have clearly defined objectives supported by a Business Intelligence solution. In this session, we’ll look at specific features in the IFS Applications Business Intelligence solution. See how easily these features can help you support strategic business initiatives and reach improved operational results.
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
451 Analyst Matt Aslett, Cloudera CEO Mike Olson and Cloudera customers RIM and YP (formerly AT&T Interactive) to learn:
» Why Cloudera customers have chosen CDH to get started with Hadoop
» The business value resulting from analyzing new data sources in new ways
» How Hadoop will change these Customers’ business and industry over the next 3-5 years
DB2 for z/OS Update Data Warehousing On System ZSurekha Parekh
Abstract:
Data Warehouses delivers the floor in most Business analytics solution. Recent analysis reveals that the demand for near-real-time data, as well as integration between day-to-day business applications increases.
IBM will share insight on today’s business environment and its impact on an IT organization’s ability to deliver a competitive Business Analytics and Data Warehousing strategy. You will learn how DB2 for z/OS combined with other InfoSphere data movement offerings from IBM can help enable information on demand – user demands.
Simplifying Big Data Analytics for the BusinessTeradata Aster
Tasso Argyros, Co-Founder & Co-President, Teradata Aster presents at the 2012 Big Analytics Roadshow.
The opportunity exists for organizations in every industry to unlock the power of iterative, big data analysis with new applications such as digital marketing optimization and social network analysis to improve their bottom line. Big data analysis is not just the ability to analyze large volumes of data, but the ability to analyze more varieties of data by performing more complex analysis than is possible with more traditional technologies. This session will demonstrate how to bring the science of data to the art of business by empowering more business users and analysts with operationalized insights that drive results. See how data science is making emerging analytic technologies more accessible to businesses while providing better manageability to enterprise architects across retail, financial services, and media companies.
Similar to Compegence: Nagaraj Kulkarni - Hadoop and No SQL_TDWI_2011Jul23_Preso (20)
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Compegence: Nagaraj Kulkarni - Hadoop and No SQL_TDWI_2011Jul23_Preso
1. Co-Creating COMPetitive intelliGENCE through
Process, Data and Domain driven Information Excellence
Hadoop and No SQL
PRESENTED in TDWI India, Hyderabad (2011 July)
Nagaraj Kulkarni
Hadoop and No SQL Slide 1 2011 Jul
2. Process, Data and Domain Integrated Approach
Market
Actions Decision Excellence
Actionable Systemic Competitive Advantage lies in
Changes the exploitation of:
Usable Business
Process Landscape –More detailed and specific
information
–More comprehensive
Business external data & dependencies
Timely Infor
Systems Intent –Fuller integration
mation
–More in depth analysis
Business –More insightful plans and
Flexible strategies
Usage
Domain Data –More rapid response to
Scalable business events
Cost
–More precise and apt
response to customer events
Sustainable
Effort
Skills &
Competency
Ram Charan’s Book: What The CEO Wants You To Know: How Your Company Really Works
COMPEGENCE
Hadoop and No SQL Slide 2
2 Information Excellence Foundation
2011 Jul
3. Touching Upon
Context For Big Data Challenges
Data Base Systems – Pre Hadoop Strengths and Limitations
What is Scale, Why No SQL
Think Hadoop, Hadoop Eco system
Think Map Reduce
Nail Down Map Reduce
Think GRID (Distributed Architecture)
Deployment Options
Map Reduce Not and Map Reduce Usages
Nail Down HDFS and GRID Architecture
Hadoop and No SQL Slide 3 2011 Jul
5. Systemic Changes
Boundary less ness
Connected Best Sourcing
Globe Interlinked Culture
Demand Side Focus
Customer
Centric Bottom Up Innovation
Empowered employees
Leading Trends
Agility and Responsiveness
Response Time Speed, Agility, Flexibility
Hadoop and No SQL Slide 5 2011 Jul
6. Landscape To Address
Manageability
Data Scalability
Explosion Performance
Agility
Information
Overload Decision Making
Time to Action
Interlinked Boundaryless
Processes Systemic Understanding
Collaborate and Synergize
& Systems
Simplify and Scale
Hadoop and No SQL Slide 6 2011 Jul
7. Information Overload
A wealth of
information
creates
a poverty of
attention.
Herbert Simon,
Nobel Laureate Economist
Hadoop and No SQL Confidential
COMPEGENCE Slide 7 7 2011 Jul
9. Scale – What is it?
Hadoop and No SQL Slide 9 2011 Jul
10. How do we scale
Traditional System - How they achieve Scalability
Multi Threading
Multiple CPU – Parallel Processing
Distributed Programming – SMP & MPP
ETL Load Distribution – Assigning jobs to different nodes
Improved Throughput
Hadoop and No SQL Slide 10 2011 Jul
11. Scale – What is it about?
Facebook 1.73 Billion Internet Users
500 Million Active eBay
Users per Month 90 Million Active 247 Billion emails per day
Users
500 Billion+ Page 126 Million Blogs
Views per month 10 Billion
Requests per day 5 Billion Facebook Content
25 Billion+ Content per week
per month 220 million+ items
on sale 50 Million Tweets per day
15 TB New Data / day
1200 m/cs, 21 PB 40 TB + / day 80% of this data is
Cluster 40 PB of Data unstructured
Yahoo Twitter Estimated 800 GB of data
82 PB of Data 1 TB plus / day per user (million Petabyte!)
25000+ nodes 80 + nodes
Hadoop and No SQL Slide 11 2011 Jul
12. How do we scale – Think Numbers
Thinking of Scale - Need for Grid
Think Numbers
Data Highway
1000 Nodes / DC
Datamart
lb
n1 -nn 100 mps/pipe 10 DC
Log storage
bp
& processing
dc 2
1K byte webserver log record
dc 1
Web server Log
Datamart
1 second / row
……….
……….
In one day
1000 * 10 * 1K * 60 * 60 * 24 = 864 GB
Storage for a year
864 GB * 365 = 315 TB
To store 1 PB – 40K * 1000 = Millions $
To process 1 TB = 1000 minutes ~ 17 hrs
Think Agility and Flexibility
Hadoop and No SQL Slide 12 2011 Jul
13. Scale – What is it about?
Volume
Speed
Integration level
more…
Does it scale linearly
with data size and
analysis complexity
Hadoop and No SQL Slide 13 2011 Jul
14. We would not have no issues…
If the following assumptions Hold Good:
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Hadoop and No SQL Slide 14 2011 Jul
16. New Paradigm: Go Back to Basics
Divide and Conquer (Divide and Delegate and Get Done)
Move Work or Workers ?
Relax Constraints (Pre defined data models)
Expect and Plan for Failures (avoid n address failures)
Community backup
Assembly Line Processing
(Scale, Speed, Efficiency, Commodity Worker)
The “For loop”
Parallelization (trivially parallelizable)
Infrastructure and Supervision (Grid Architecture)
Manage Dependencies
Ignore the Trivia (Trivia is relative!)
Joel Spolsky
Charlie Munger’s Mental Models http://www.joelonsoftware.com/items/2006/08/01.html
Hadoop and No SQL Slide 16 2011 Jul
17. New Paradigm: Go Back to Basics
Map Reduce Paradigm Grid Architecture
Divide and Conquer Split and Delegate
The “for loop” Move Work or Workers
Sort and Shuffle Expect and Plan for Failures
Parallelization (trivially parallelizable) Assembly Line Processing (Scale,
Relax Data Constraints Speed, Efficiency, Commodity Worker)
Assembly Line Processing Scale, Manage Dependencies and Failures
Speed, Efficiency, Commodity Worker) Ignore the Trivia (Trivia is relative!)
Map Reduce History Replication, Redundancy,
Lisp Heart Beat Check, Cluster rebalancing,
Unix Fault Tolerance, Task Restart,
Google FS Chaining of jobs (Dependencies),
Graceful Restart,
Look Ahead or Speculative execution,
Hadoop and No SQL Slide 17 2011 Jul
18. No SQL Options
Hbase/Cassandra for huge data volumes- PBs.
•Hbase fits in well where Hadoop is already being used.
•Cassandra less cumbersome to install/manage
MongoDB/CouchDB
Document oriented databases for easy use and GB-TB
volumes. Might be problematic at PB scales
Neo4j like graph databases
for managing relationship oriented applications- nodes and
edges
Riak, redis, membase like Simple key-value databases
for huge distributed in-memory hash maps
Hadoop and No SQL Slide 18 2011 Jul
19. Let us Think Hadoop
Hadoop and No SQL Slide 19 2011 Jul
20. RDBMS and Hadoop
RDBMS MapReduce
Data size Gigabytes Petabytes
Interactive and
Access Batch
batch
Unstructured
Structure Fixed schema
schema
Procedural (Java,
Language SQL
C++, Ruby, etc)
Integrity High Low
Scaling Nonlinear Linear
Write once, read
Updates Read and write
many times
Latency Low High
Hadoop and No SQL Slide 20 2011 Jul
21. Apache Hadoop Ecosystem
Hadoop Common: The common utilities that support the other Hadoop
subprojects.
HDFS: A distributed file system that provides high throughput access to
application data.
MapReduce: A software framework for distributed processing of large data sets
on compute clusters.
Pig: A high-level data-flow language and execution framework for parallel
computation.
HBase / Flume / Scribe: A scalable, distributed database that supports
structured data storage for large tables.
Hive: A data warehouse infrastructure that provides data summarization and ad
hoc querying.
ZooKeeper: A high-performance coordination service for distributed
applications.
Flume: Message Que Processing
Mahout: scalable Machine Learning algorithms using Hadoop
Chukwa: A data collection system for managing large distributed systems.
Hadoop and No SQL Slide 21 2011 Jul
23. HDFS – The BackBone
Hadoop Distributed File System
Hadoop and No SQL Slide 23 2011 Jul
24. Map Reduce – The New Paradigm
Transforming Large Data
MapReduce Basics
•Functional Programming
Mappers
•List Processing
•Mapping Lists
Reducers
Hadoop and No SQL Slide 24 2011 Jul
25. PIG – Help the Business User Query
Pig: Data-aggregation functions over semi-structured data (log files).
Pig Latin Programs Query Parser Logical Plan
Semantic Checking Logical Plan
Logical Optimizer Optimized Logical Plan
Logical to Physical Translator Physical Plan
Physical To M/R Translator MapReduce Plan
Map Reduce Launcher
Create a job jar to be submitted to Hadoop cluster
Hadoop and No SQL Slide 25 2011 Jul
27. HBASE – Scalable Columnar
• Scalable, Reliable, Distributed DB
• Columnar Structure
• Built on top of HDFS
• Map Reduceable
• A SQL Database!
– No joins
– No sophisticated query engine
– No transactions
– No column typing
– No SQL, no ODBC/JDBC, etc.
• Not a replacement for your RDBMS...
Hadoop and No SQL Slide 27 2011 Jul
28. HIVE – SQL Like
• A high level interface on Hadoop for managing and
querying structured data
• Interpreted as Map-Reduce jobs for execution
• Uses HDFS for storage
• Uses Metadata representation over hdfs files
• Key Building Principles:
• Familiarity with SQL
• Performance with help of built-in optimizers
• Enable Extensibility – Types, Functions, Formats,
Scripts
Hadoop and No SQL Slide 28 2011 Jul
29. FLUME – Distributed Data Collection
• Distributed Data / Log Collection Service
• Scalable, Configurable, Extensible
• Centrally Manageable
• Agents fetch data from apps, Collectors save it
• Abstrations: Source -> Decrator(s) -> Sink
Hadoop and No SQL Slide 29 2011 Jul
30. Oozie – Workflow Management
An Oozie Workflow
M/R
streaming OK
job
SSH OK
start HOD fork join
Alloc
Pig MORE
OK decision
job
ERROR
ERROR
M/R ENOUGH
ERROR job
ERROR
kill OK
ERROR
ERROR
Java Main
OK FS OK
end
job
Hadoop and No SQL Slide 30 2011 Jul
31. Think Map n Reduce
Hadoop and No SQL Slide 31 2011 Jul
34. Map Reduce Paradigm
Job
Configure the Hadoop Job to run.
Mapper
map(LongWritable key, Text value, Context context)
Reducer
reduce(Text key, Iterable<IntWritable> values, Context context)
Hadoop and No SQL Slide 34 2011 Jul
35. Programming model
Map –Reduce Definition
MapReduce is a
functional programming model and an
associated implementation model
for processing and generating large data sets.
Users specify a map function that processes a key/value pair
to generate a set of intermediate key/value pairs,
and
a reduce function that merges all intermediate values associated
with the same intermediate key.
Many real world tasks are expressible in this model.
CONCEPTS
Hadoop and No SQL Slide 35 2011 Jul
36. Programming model
Input & Output: each a set of key/value pairs
Programmer specifies two functions:
map (in_key, in_value) -> list(out_key, intermediate_value)
•Processes input key/value pair
•Produces set of intermediate pairs
reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all
intermediate values for a particular key
•Produces a set of merged output values (usually just one)
•Inspired by similar primitives in LISP and other languages
Hadoop and No SQL Slide 36 2011 Jul
37. Map Reduce Paradigm
Word Count Example
A simple MapReduce program can be written to determine how many times different words
appear in a set of files.
What does Mapper and Reducer do?
Pseudo Code:
mapper (filename, file-contents):
for each word in file-contents:
emit (word, 1)
reducer (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
Hadoop and No SQL Slide 37 2011 Jul
38. Programming model
Example: Count word occurrences
map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");
reduce(String output_key, Iterator intermediate_values):
// output_key: a word
// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));
Pseudocode: See appendix in paper for real code
Hadoop and No SQL Slide 38 2011 Jul
39. Understanding Map Reduce Paradigm
Map – Reduce Execution Recap
• Master-Slave architecture
• Master: JobTracker
– Accepts MR jobs submitted by users
– Assigns Map and Reduce tasks to TaskTrackers (slaves)
– Monitors task and TaskTracker status, re-executes tasks upon failure
• Worker: TaskTrackers
– Run Map and Reduce tasks upon instruction from the Jobtracker
– Manage storage and transmission of intermediate output
Hadoop and No SQL Slide 39 2011 Jul
40. Understanding Map Reduce Paradigm
Map – Reduce Paradigm Recap
Example of map functions –
Individual Count, Filter, Transformation, Sort, Pig load
Example of reduce functions –
Group Count, Sum, Aggregator
A job can have many map and reducers functions.
Hadoop and No SQL Slide 40 2011 Jul
41. How are we doing on the Objective
Hadoop and No SQL Slide 41 2011 Jul
42. Process, Data and Domain driven Information Excellence
ABOUT COMPEGENCE
Hadoop and No SQL Slide 42 2011 Jul
43. Process, Data and Domain
Integrated Approach
Market
Actions
Actionable Systemic
Decision Excellence
Changes Competitive Advantage lies in
the exploitation of:
Usable Business
Process Landscape
–More detailed and specific
information
–More comprehensive
Business
Timely Infor external data & dependencies
Systems Intent
mation –Fuller integration
–More in depth analysis
Business
Flexible
Usage
–More insightful plans and
strategies
Domain Data
–More rapid response to
Scalable
Cost business events
–More precise and apt
Sustainable response to customer events
Effort
Skills &
Competency
We complement your “COMPETING WITH ANALYTICS JOURNEY”
Hadoop and No SQL Slide 43 2011 Jul
44. Value Proposition
Constraints Decisions?
Decisions
Tools
Alternatives Data
Technologies Assumptions
Dependencies
Trends Concerns / Risks TeraBytes Processes Actions
Cost of Ownership Meta data Laye r f or C on sistent Bu sin e ss Unde rstandi ng
Actions?
Platforms
Technology Evolution
Sour ceD at
Data
C usto m D
a
er ata
Extr ct
Extrac t
a S ginng
ta g Ta
Tr nsfo r
ra m
Lo ad A pl ica tion s
p at i
COMPEGENCE
A ssets
Busin s Rul s
e s e Anal sis
y
L i a i i t es
bl i
I n v stm t
e en
n e ra
I t g te Trusted Dashboa ds
r
T n t
ra sla e Da ta
C ards
Segme n t
People R eference D ata
(B r nch, P rodu ct )
a
P art erD ata
n
s Repeatable D ri e
e v
P li g
rofi n
Fou ndat i n
with
o Reports
Excel
DW n
I terface
C R M/ Marketi g
P rograms
n
Reusable Su m rize
m a
Pla tf orm
Pla tf orm
Results
Processes Leverage
Data Qua lit y and Pro cess Aud it
Results?
Trade Offs
Partners People
Reports
Cost Ease of Use: Current State Returns
Drill Down, Up, Across
Time Returns?
Dashboards
Jump Start the “Process and Information Excellence” journey
Focus on your business goals and “Competing with Analytics Journey”
Overcome multiple and diverse expertise / skill-set paucity
Preserve current investments in people and technology
Manage Data complexities and the resultant challenges
Manage Scalability to address data explosion with Terabytes of Data
Helps you focus on the business and business processes
Helps you harvest the benefits of your data investments faster
Consultative Work-thru Workshops that help and mature your team
Hadoop and No SQL Slide 44 2011 Jul
45. Our Expertise and Focus Areas
Process + Data + Domain => Decision
Analytics; Data Mining; Big Data; DWH & BI
Architecture and Methodology
Partnered Product Development
Consulting, Competency Building, Advisory, Mentoring
Executive Briefing Sessions and Deep Dive Workshops
Hadoop and No SQL Slide 45 2011 Jul
46. Partners in Co-Creating Success
Process, Data and Domain driven Information Excellence
Process, Data and Domain driven Business Decision Life Cycle
www.compegence.com
info@compegence.com
Hadoop and No SQL Slide 46 2011 Jul