SlideShare a Scribd company logo
1 of 36
Download to read offline
Big	
  Data	
  App	
  Server	
  
Lance	
  Riedel	
  
Big Data App Server
A	
  new	
  applica5on	
  framework	
  for	
  (4	
  V’s):	
  
•  Volume	
  of	
  raw	
  data	
  (Petabytes)	
  
•  Velocity	
  at	
  which	
  it	
  is	
  being	
  generated/
ingested	
  	
  
•  Variety	
  of	
  data	
  sources	
  and	
  schemas	
  
•  Advanced	
  data	
  sciences	
  and	
  analy5cs	
  that	
  
can	
  be	
  applied	
  to	
  extract	
  Value	
  
	
  
Big Data App Server Use Cases
•  Log/Machine	
  Analy5cs	
  
•  Security/Fraud	
  Detec5on	
  
•  Sensor	
  Data	
  Analy5cs	
  
•  Financial	
  Analy5cs	
  
•  Retail	
  Analy5cs	
  
•  Ad	
  Targe5ng	
  
•  Recommenda5on	
  (e.g.	
  NeMlix,	
  Amazon)	
  
	
  
ComponentsBigDataPlatform
APP	
  SERVER	
  COMPONENTS	
  	
  
Storage and ComputeBigDataPlatform
Storage and Compute
Mo8va8on	
  
Google	
  needed	
  to	
  capture	
  the	
  web	
  and	
  
process	
  it	
  efficiently	
  
	
  
•  Calculate	
  importance	
  of	
  pages,	
  words,	
  
domains	
  against	
  each	
  other	
  
•  The	
  more	
  cost-­‐effec5ve	
  they	
  could	
  make	
  
it	
  -­‐	
  the	
  more	
  they	
  could	
  process,	
  index,	
  
understand	
  
	
  
Storage/Compute: Centralized
•  Centralized	
  doesn’t	
  scale!	
  	
  
•  Move	
  a	
  lot	
  of	
  data	
  –	
  boWleneck	
  
Storage/Compute: Sharding
•  Sharding	
  is	
  spliXng	
  the	
  problem	
  into	
  isolated	
  chunks	
  
•  Sharding	
  scales,	
  but	
  fails	
  when	
  you	
  need	
  to	
  look	
  
across	
  the	
  data	
  
•  E.G.	
  How	
  to	
  calculate	
  term	
  weights	
  or	
  top	
  pages	
  
across	
  shards??	
  
✓	
   ✓	
   ✓	
   ✓	
   ✓	
   ✓	
   ✓	
  
≠	
  
DFS, MapReduce
•  Used	
  a	
  new	
  programming	
  model	
  to	
  
distribute	
  computa5on	
  AND	
  data	
  (NOT	
  
sharding)	
  
•  Runs	
  on	
  commodity	
  hardware	
  	
  
•  Failure	
  resilience	
  using	
  so_ware	
  control	
  
•  Easy	
  to	
  calculate	
  across	
  corpus	
  	
  
•  Two	
  parts	
  of	
  a	
  complete	
  Solu5on:	
  
•  Distributed	
  File	
  System	
  –	
  DFS	
  
•  MapReduce	
  
Distributed File System
MapReduce
•  Process	
  where	
  the	
  data	
  resides	
  (Data	
  and	
  compute	
  are	
  local	
  to	
  each	
  other)	
  
•  Map	
  (read	
  the	
  data,	
  emit	
  a	
  key	
  and	
  a	
  value)	
  
•  Reduce	
  (group	
  all	
  values	
  per	
  key,	
  perform	
  another	
  opera5on)	
  
Hadoop
•  Open	
  Source	
  implementa5on	
  of	
  
Google’s	
  DFS	
  and	
  MapReduce	
  
whitepaper	
  
•  Huge	
  Eco-­‐System	
  
•  Used	
  by:	
  Yahoo,	
  Facebook,	
  TwiWer,	
  
LinkedIn,	
  Sears,	
  Apple,	
  The	
  New	
  York	
  
Times,	
  Telefonica,	
  +1000’s	
  more!	
  
ManagementBigDataPlatform
Data Ingestion
Mo8va8on	
  
•  Data	
  origina5ng	
  from	
  a	
  
variety	
  of	
  sources	
  
	
  
•  Some	
  data	
  more	
  
valuable	
  than	
  others:	
  
•  Time-­‐to-­‐live	
  (TTL)	
  
•  Guarantees	
  on	
  
delivery	
  
Data Ingestion: Apache Flume
•  A	
  scalable,	
  fault-­‐tolerant,	
  configurable	
  topology	
  
data	
  inges5on	
  pipeline	
  that	
  works	
  hand	
  in	
  hand	
  with	
  
the	
  Hadoop	
  Eco-­‐System	
  
•  Configurable	
  delivery	
  guarantees	
  
	
   	
  -­‐	
  rou5ng,	
  replica5on,	
  failover	
  
•  Extensible	
  sources	
  and	
  sinks	
  allows	
  for	
  pluggable	
  
data	
  sources	
  
•  Scales	
  out	
  horizontally	
  –	
  100k’s	
  messages/sec	
  
Workflow
Mo8va8on	
  
Transforming,	
  storing,	
  joining,	
  data	
  can	
  take	
  a	
  lot	
  
of	
  steps	
  that	
  need	
  to	
  be	
  repeatable	
  and	
  traceable	
  –	
  
the	
  programming	
  model	
  for	
  data	
  
	
  
	
  
Workflow: Oozie
A	
  workflow	
  engine	
  that	
  understands	
  the	
  
dependency	
  graph	
  of	
  work	
  and	
  can	
  schedule,	
  
replay,	
  and	
  report	
  on	
  the	
  steps	
  
	
  
•  Jobs	
  triggered	
  by	
  5me	
  (frequency)	
  and	
  data	
  
availability	
  
•  Integrated	
  with	
  the	
  rest	
  of	
  the	
  Hadoop	
  stack	
  
•  Scalable,	
  reliable	
  and	
  extensible	
  system.	
  
	
  	
  
	
  
	
  
	
  
Schema Management
Mo8va8on	
  
As	
  data	
  sources	
  explode,	
  the	
  need	
  to	
  understand	
  
the	
  data	
  schemas	
  becomes	
  a	
  principle	
  concern	
  
	
  
Schema: HCatalog
•  A	
  table	
  and	
  storage	
  management	
  layer	
  for	
  
Hadoop	
  	
  
•  Enables	
  users	
  with	
  different	
  data	
  
processing	
  tools	
  –	
  Pig,	
  MapReduce,	
  and	
  
Hive	
  –	
  to	
  more	
  easily	
  read	
  and	
  write	
  data	
  
on	
  the	
  grid.	
  	
  
	
  
	
  
	
  
	
  
Schema: Avro
	
  
•  A	
  data	
  serializa5on	
  system	
  
•  When	
  Avro	
  data	
  is	
  stored	
  in	
  a	
  file,	
  its	
  
schema	
  is	
  stored	
  with	
  it	
  
•  Correspondence	
  between	
  same	
  named	
  
fields,	
  missing	
  fields,	
  extra	
  fields,	
  etc.	
  can	
  
all	
  be	
  easily	
  resolved.	
  
•  Most	
  technologies	
  in	
  the	
  Hadoop	
  stack	
  	
  
understand	
  avro–	
  interoperability/data	
  
passing	
  
	
  
Data Access, QueryingBigDataPlatform
Data Access
Mo8va8on	
  
Various	
  data	
  access	
  paWerns	
  require	
  data	
  stores	
  
beyond	
  just	
  the	
  DFS	
  files.	
  An	
  example	
  is	
  a	
  key	
  value	
  
store	
  that	
  needs	
  random	
  access	
  to	
  data.	
  
	
  
Solu8on(s)	
  
There	
  are	
  a	
  number	
  of	
  solu5ons	
  depending	
  on	
  the	
  
use	
  case.	
  	
  
•  Google’s	
  BigTable	
  whitepaper	
  
•  SQL	
  has	
  been	
  adapted	
  to	
  Hadoop	
  	
  
Data Access: HBase
•  The	
  Hadoop	
  database	
  -­‐	
  a	
  distributed,	
  
scalable,	
  big	
  data	
  store	
  (sorted	
  map)	
  –	
  
from	
  Google’s	
  BigTable,	
  backed	
  by	
  Hadoop	
  
DFS	
  
•  Linear	
  and	
  modular	
  scalability.	
  
•  Automa5c	
  and	
  configurable	
  sharding	
  of	
  
tables	
  
•  Automa5c	
  failover	
  support	
  	
  
•  Convenient	
  base	
  classes	
  for	
  backing	
  
Hadoop	
  MapReduce	
  jobs	
  with	
  Apache	
  
HBase	
  tables.	
  
Data Access: SQL – Hive, Impala
•  SQL	
  querying	
  of	
  raw	
  data	
  on	
  the	
  
distributed	
  file	
  system	
  
•  Impala	
  –	
  Query	
  files	
  on	
  HDFS	
  including	
  
SELECT,	
  JOIN,	
  and	
  aggregate	
  func5ons	
  –	
  in	
  
real	
  5me	
  
•  Hive	
  –	
  provides	
  easy	
  data	
  summariza5on,	
  
ad-­‐hoc	
  queries,	
  and	
  the	
  analysis	
  of	
  large	
  
datasets	
  stored	
  in	
  Hadoop	
  compa5ble	
  file	
  
systems	
  
AnalyticsBigDataPlatform
Data Analytics
Mo8va8on	
  
•  Discover	
  the	
  latent	
  value	
  of	
  the	
  data.	
  The	
  core	
  
mo5va5on	
  behind	
  Big	
  Data!	
  
•  Clustering,	
  Machine	
  Learning,	
  Correla5ons,	
  
Modeling	
  –	
  the	
  guts	
  of	
  the	
  Data	
  Science	
  –	
  o_en	
  
extremely	
  diverse	
  use	
  cases.	
  	
  
	
  
Solu8on(s)	
  
A	
  pluggable	
  architecture	
  that	
  can	
  share	
  schemas,	
  
but	
  allow	
  for	
  a	
  suite	
  of	
  tools	
  appropriate	
  for	
  the	
  
use	
  case	
  
Data Analytics: Example
Frameworks
•  Mahout	
  
•  Machine	
  learning,	
  clustering	
  
•  PaWern	
  -­‐	
  Machine	
  Learning	
  DSL	
  for	
  Hadoop	
  from	
  
Cascading	
  
•  0xData	
  
•  Open	
  source	
  math	
  and	
  predic5on	
  engine	
  for	
  big	
  data	
  
•  Sample	
  Algorithms	
  
•  Random	
  Forest	
  algorithm	
  
•  K-­‐Means	
  Clustering	
  
•  Hierarchical	
  Clustering	
  
•  Linear	
  Regression	
  
•  Logis5c	
  Regression	
  
•  Support	
  Vector	
  Machines	
  
•  Ar5ficial	
  Neural	
  Networks	
  
•  Associa5on	
  Rule	
  Learning	
  
ServingBigDataPlatform
Serving
Mo8va8on	
  
•  Powering	
  applica5ons	
  for	
  end	
  users	
  
•  Search/browse	
  and	
  recommenda5on	
  engines	
  
allow	
  real-­‐5me	
  access	
  to	
  data	
  	
  
Serving: Search – Solr
Cloud
•  Builds	
  indexes	
  on	
  top	
  of	
  Hadoop	
  
•  Horizontally	
  scalable,	
  fault	
  tolerant	
  
•  Incredible	
  flexibility	
  in	
  indexing	
  op5ons	
  
•  Tokeniza5on	
  
•  Field	
  types	
  
•  Data	
  storage	
  
•  Search	
  op5ons	
  just	
  as	
  flexible	
  
•  AND,OR,NOT,	
  wildcard	
  
•  Facets	
  (counts	
  from	
  a	
  derived	
  ontology)	
  
•  Extensive	
  algorithm	
  and	
  weigh5ng	
  plug-­‐
ability	
  
Serving: Manas – Matching Engine
•  The	
  Hive’s	
  massively	
  scalable	
  
matching	
  engine	
  	
  
•  Handles	
  100’s	
  millions	
  to	
  billions	
  of	
  
documents	
  efficiently	
  while	
  matching	
  
against	
  100’s	
  to	
  1000’s	
  features	
  
•  Nothing	
  exists	
  today	
  in	
  the	
  Open	
  
Source	
  community	
  that	
  has	
  these	
  
capabili5es	
  
EXAMPLE	
  APP	
  USE-­‐CASE	
  
App Server Data Flow
SecurityX on App Server

More Related Content

What's hot

Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 

What's hot (20)

Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Big data course
Big data  courseBig data  course
Big data course
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 

Viewers also liked

Redbook
RedbookRedbook
Redbookens007
 
Very beautiful
Very beautifulVery beautiful
Very beautifulasmaeazed
 
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29The Hive
 
Untethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsUntethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsThe Hive
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive
 
Expt panel hive_data_rp_20130320_final-1
Expt panel hive_data_rp_20130320_final-1Expt panel hive_data_rp_20130320_final-1
Expt panel hive_data_rp_20130320_final-1The Hive
 
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...The Hive
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonThe Hive
 
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013The Hive
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLThe Hive
 
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive
 
Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
Search at Linkedin by Sriram Sankar and Kumaresh PattabiramanSearch at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
Search at Linkedin by Sriram Sankar and Kumaresh PattabiramanThe Hive
 
1.nigam shah stanford_meetup
1.nigam shah stanford_meetup1.nigam shah stanford_meetup
1.nigam shah stanford_meetupThe Hive
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Redefine healthcare with IT by Niranjan Thirumale
Redefine healthcare with IT by Niranjan ThirumaleRedefine healthcare with IT by Niranjan Thirumale
Redefine healthcare with IT by Niranjan ThirumaleThe Hive
 
My magazine edited
My magazine editedMy magazine edited
My magazine editedsofiamorana1
 

Viewers also liked (20)

Redbook
RedbookRedbook
Redbook
 
San martin 2013 2014
San martin 2013 2014San martin 2013 2014
San martin 2013 2014
 
Mumhsocialpdf
MumhsocialpdfMumhsocialpdf
Mumhsocialpdf
 
Very beautiful
Very beautifulVery beautiful
Very beautiful
 
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
 
Untethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsUntethered health in a networked society by James Mathews
Untethered health in a networked society by James Mathews
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
 
Expt panel hive_data_rp_20130320_final-1
Expt panel hive_data_rp_20130320_final-1Expt panel hive_data_rp_20130320_final-1
Expt panel hive_data_rp_20130320_final-1
 
Bizitzaren historia
Bizitzaren  historiaBizitzaren  historia
Bizitzaren historia
 
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
 
La musica
La musicaLa musica
La musica
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
 
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQL
 
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
 
Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
Search at Linkedin by Sriram Sankar and Kumaresh PattabiramanSearch at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
 
1.nigam shah stanford_meetup
1.nigam shah stanford_meetup1.nigam shah stanford_meetup
1.nigam shah stanford_meetup
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Redefine healthcare with IT by Niranjan Thirumale
Redefine healthcare with IT by Niranjan ThirumaleRedefine healthcare with IT by Niranjan Thirumale
Redefine healthcare with IT by Niranjan Thirumale
 
My magazine edited
My magazine editedMy magazine edited
My magazine edited
 

Similar to Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event

Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computingSachin Gowda
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfSumanthReddy540432
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratchVinayak Hegde
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentationmlang222
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 

Similar to Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event (20)

Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data
Big DataBig Data
Big Data
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdf
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

More from The Hive

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie MuirheadThe Hive
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroThe Hive
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 

More from The Hive (20)

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve Omohundro
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at Twitter
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event

  • 1. Big  Data  App  Server   Lance  Riedel  
  • 2. Big Data App Server A  new  applica5on  framework  for  (4  V’s):   •  Volume  of  raw  data  (Petabytes)   •  Velocity  at  which  it  is  being  generated/ ingested     •  Variety  of  data  sources  and  schemas   •  Advanced  data  sciences  and  analy5cs  that   can  be  applied  to  extract  Value    
  • 3.
  • 4. Big Data App Server Use Cases •  Log/Machine  Analy5cs   •  Security/Fraud  Detec5on   •  Sensor  Data  Analy5cs   •  Financial  Analy5cs   •  Retail  Analy5cs   •  Ad  Targe5ng   •  Recommenda5on  (e.g.  NeMlix,  Amazon)    
  • 8. Storage and Compute Mo8va8on   Google  needed  to  capture  the  web  and   process  it  efficiently     •  Calculate  importance  of  pages,  words,   domains  against  each  other   •  The  more  cost-­‐effec5ve  they  could  make   it  -­‐  the  more  they  could  process,  index,   understand    
  • 9. Storage/Compute: Centralized •  Centralized  doesn’t  scale!     •  Move  a  lot  of  data  –  boWleneck  
  • 10. Storage/Compute: Sharding •  Sharding  is  spliXng  the  problem  into  isolated  chunks   •  Sharding  scales,  but  fails  when  you  need  to  look   across  the  data   •  E.G.  How  to  calculate  term  weights  or  top  pages   across  shards??   ✓   ✓   ✓   ✓   ✓   ✓   ✓   ≠  
  • 11. DFS, MapReduce •  Used  a  new  programming  model  to   distribute  computa5on  AND  data  (NOT   sharding)   •  Runs  on  commodity  hardware     •  Failure  resilience  using  so_ware  control   •  Easy  to  calculate  across  corpus     •  Two  parts  of  a  complete  Solu5on:   •  Distributed  File  System  –  DFS   •  MapReduce  
  • 13. MapReduce •  Process  where  the  data  resides  (Data  and  compute  are  local  to  each  other)   •  Map  (read  the  data,  emit  a  key  and  a  value)   •  Reduce  (group  all  values  per  key,  perform  another  opera5on)  
  • 14. Hadoop •  Open  Source  implementa5on  of   Google’s  DFS  and  MapReduce   whitepaper   •  Huge  Eco-­‐System   •  Used  by:  Yahoo,  Facebook,  TwiWer,   LinkedIn,  Sears,  Apple,  The  New  York   Times,  Telefonica,  +1000’s  more!  
  • 16. Data Ingestion Mo8va8on   •  Data  origina5ng  from  a   variety  of  sources     •  Some  data  more   valuable  than  others:   •  Time-­‐to-­‐live  (TTL)   •  Guarantees  on   delivery  
  • 17. Data Ingestion: Apache Flume •  A  scalable,  fault-­‐tolerant,  configurable  topology   data  inges5on  pipeline  that  works  hand  in  hand  with   the  Hadoop  Eco-­‐System   •  Configurable  delivery  guarantees      -­‐  rou5ng,  replica5on,  failover   •  Extensible  sources  and  sinks  allows  for  pluggable   data  sources   •  Scales  out  horizontally  –  100k’s  messages/sec  
  • 18. Workflow Mo8va8on   Transforming,  storing,  joining,  data  can  take  a  lot   of  steps  that  need  to  be  repeatable  and  traceable  –   the  programming  model  for  data      
  • 19. Workflow: Oozie A  workflow  engine  that  understands  the   dependency  graph  of  work  and  can  schedule,   replay,  and  report  on  the  steps     •  Jobs  triggered  by  5me  (frequency)  and  data   availability   •  Integrated  with  the  rest  of  the  Hadoop  stack   •  Scalable,  reliable  and  extensible  system.            
  • 20. Schema Management Mo8va8on   As  data  sources  explode,  the  need  to  understand   the  data  schemas  becomes  a  principle  concern    
  • 21. Schema: HCatalog •  A  table  and  storage  management  layer  for   Hadoop     •  Enables  users  with  different  data   processing  tools  –  Pig,  MapReduce,  and   Hive  –  to  more  easily  read  and  write  data   on  the  grid.            
  • 22. Schema: Avro   •  A  data  serializa5on  system   •  When  Avro  data  is  stored  in  a  file,  its   schema  is  stored  with  it   •  Correspondence  between  same  named   fields,  missing  fields,  extra  fields,  etc.  can   all  be  easily  resolved.   •  Most  technologies  in  the  Hadoop  stack     understand  avro–  interoperability/data   passing    
  • 24. Data Access Mo8va8on   Various  data  access  paWerns  require  data  stores   beyond  just  the  DFS  files.  An  example  is  a  key  value   store  that  needs  random  access  to  data.     Solu8on(s)   There  are  a  number  of  solu5ons  depending  on  the   use  case.     •  Google’s  BigTable  whitepaper   •  SQL  has  been  adapted  to  Hadoop    
  • 25. Data Access: HBase •  The  Hadoop  database  -­‐  a  distributed,   scalable,  big  data  store  (sorted  map)  –   from  Google’s  BigTable,  backed  by  Hadoop   DFS   •  Linear  and  modular  scalability.   •  Automa5c  and  configurable  sharding  of   tables   •  Automa5c  failover  support     •  Convenient  base  classes  for  backing   Hadoop  MapReduce  jobs  with  Apache   HBase  tables.  
  • 26. Data Access: SQL – Hive, Impala •  SQL  querying  of  raw  data  on  the   distributed  file  system   •  Impala  –  Query  files  on  HDFS  including   SELECT,  JOIN,  and  aggregate  func5ons  –  in   real  5me   •  Hive  –  provides  easy  data  summariza5on,   ad-­‐hoc  queries,  and  the  analysis  of  large   datasets  stored  in  Hadoop  compa5ble  file   systems  
  • 28. Data Analytics Mo8va8on   •  Discover  the  latent  value  of  the  data.  The  core   mo5va5on  behind  Big  Data!   •  Clustering,  Machine  Learning,  Correla5ons,   Modeling  –  the  guts  of  the  Data  Science  –  o_en   extremely  diverse  use  cases.       Solu8on(s)   A  pluggable  architecture  that  can  share  schemas,   but  allow  for  a  suite  of  tools  appropriate  for  the   use  case  
  • 29. Data Analytics: Example Frameworks •  Mahout   •  Machine  learning,  clustering   •  PaWern  -­‐  Machine  Learning  DSL  for  Hadoop  from   Cascading   •  0xData   •  Open  source  math  and  predic5on  engine  for  big  data   •  Sample  Algorithms   •  Random  Forest  algorithm   •  K-­‐Means  Clustering   •  Hierarchical  Clustering   •  Linear  Regression   •  Logis5c  Regression   •  Support  Vector  Machines   •  Ar5ficial  Neural  Networks   •  Associa5on  Rule  Learning  
  • 31. Serving Mo8va8on   •  Powering  applica5ons  for  end  users   •  Search/browse  and  recommenda5on  engines   allow  real-­‐5me  access  to  data    
  • 32. Serving: Search – Solr Cloud •  Builds  indexes  on  top  of  Hadoop   •  Horizontally  scalable,  fault  tolerant   •  Incredible  flexibility  in  indexing  op5ons   •  Tokeniza5on   •  Field  types   •  Data  storage   •  Search  op5ons  just  as  flexible   •  AND,OR,NOT,  wildcard   •  Facets  (counts  from  a  derived  ontology)   •  Extensive  algorithm  and  weigh5ng  plug-­‐ ability  
  • 33. Serving: Manas – Matching Engine •  The  Hive’s  massively  scalable   matching  engine     •  Handles  100’s  millions  to  billions  of   documents  efficiently  while  matching   against  100’s  to  1000’s  features   •  Nothing  exists  today  in  the  Open   Source  community  that  has  these   capabili5es