SlideShare a Scribd company logo
1 of 33
Download to read offline
UnleashingthepowerofApacheAtlaswith
ApacheRanger
VirtualDataConnectorProject
NIGELJONES
JONESN@UK.IBM.COM
DATAWORKS,MUNICH,APRIL2017
Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache
SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof
thesemarks.
AboutMe–NigelJones
•https://www.linkedin.com/in/nigelljones/
•jonesn@uk.ibm.com(Anyonestilluseemail?)
•@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life
accountsdidn’tworkforme!
•AndofcoursetheApacheAtlas&Rangermailinglists&JIRA!
•Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud
J
•IBMHursley,UKsince1990
•Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata
TheProblem…..
WHYAREWEHERE…..
Data?
•WhatdatadoIhave?
•Whatdoesitmean?
•Whereisit?
•Whohasaccesstoit?
•Whoownsit?
•Whatqualityisit?
•Howdoesitrelatetootherdata?
•HowtoIcontrol,audit&understandaccess?
Regulatoryneeds
•AdheretoregulationslikeBCBS-239andGDPR
•Needtoknowmeaning,valueofthedata
•Demonstrateprocessesinplacetogovernaccess
•Audit
•Significantfinesifrulesbreached
•Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport
anagilebusiness
Sowhatdoweneedtoaddressthis?
Metadata..
•Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit.
•Analyticsanddecisionmaking
•Newbusinessapplications
•Reportingandcompliance
•Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich
datasettouseforanewproject
•Structure
•Meaning
•Origin
•Validvaluesandquality
•Usageandownership
•Regulationsandclassificationsthatapply
Whichcansupport…
•Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who
ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s
businesscontext&howthedatashouldbegoverned….
•SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata
needsanduse
•Automatedgovernanceactionstoprotectandmanageincludingauditing,
monitoring,qualitycontrol,rightsmanagement
Buteasily…
•Openframeworks&APIs
•Automaticcollection&discoveryofmetadatainadynamicheterogeneous
environment
•Usingpredefinedstandardsforglossaries,schemas,rules,regulationstoreduce
cost
•Cheaptointegratenewtools
•Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor
•Avoidingsilos
•DistributedandOpen
Thevision
		
	
	
	
		
				
				
	
	
Open	and
Unified	Metadata
VirtualizationDataConnectorproject
Datavirtualizationproject
•Collaboration–IBM,severalbanks&opencommunity
•ADataLakeenvironment
•NotjustHadoop,butothersourcestoo
•BusinessTerms,Classifications,Metadatarich
•Offervirtualizedviews.Exposerelationaldatawithbusinessterms
•ManageAccesstoresources–permit,deny,log,filter/mask….THROUGH
METADATA
•Open,pluggable
•Workingthroughusecases,design,initialMVP(thisyear)
•Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe
Atlas&Rangercommunitiesaswellascontributeourideas
•ProposedchangesallgothroughmailinglistandJIRAforfeedback
ApacheAtlas
•“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices–
enablingenterprisestoeffectivelyandefficientlymeettheircompliance
requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata
ecosystem.”….http://www.apache.org
•OpenCommunity--ApacheIncubatorsinceMay2015
•Typeagnosticmetadatastore
•RESTAPI&UI
•SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm&
others
ApacheRanger
•Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral
UIorusingRESTAPIs.
•Finegrainedauthorizationtodoaspecificactionand/oroperationwithHadoop
component/toolandmanagedthroughacentraladministrationtool
•StandardizeauthorizationmethodacrossallHadoopcomponents.
•Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol,
attributebasedaccesscontroletc.
•Centralizeauditingofuseraccessandadministrativeactions(securityrelated)
withinallthecomponentsofHadoop.
•…fromhttp://ranger.apache.org
ProjectInteractions
Search/Report
GaianDB
•Searchforlistofassetsbymetadata
•Searchfordata
•Reportingtoolobtainsdatatodrawreport
Underlyingdata,sql,hive,
HDFS,Oracle,Netezzaetc
Manageslogicalviews
Deploysrules,pushes
classifications,sourcefor
userroles(notusers)
+rangerplugintopermit/deny,masketc
Pullsrules.classifications
RDBMSHadoop
ApacheAtlas
Apache
Ranger
ApacheSolr
WhyAtlasandRanger?
•OpenSourceessentialtoforminganactiveecosystem
•Vision,activecommunity&evolving–abilitytocontribute&workwithothersto
providethebestsolution
•Alreadyhavegoodcorecapabilities
•Atlastypesystemisveryflexible
•Rangeroffersarangeofpolicytypesandprovidesapluggableframework
•Alreadycrossprojectintegration
•UseoftagbasedpolicieinRangersourcedfromAtlas
•CanbeusedindependentlyoffullHadoopstack
Refinedvirtualconnectorscopescope
	
	
	
	
	
GaianDB
Ranger
Plugin
	
	
Titan
(GraphDB,
Metadata
Repository)
Ranger
Config
RangerServer
Atlas
PollPolicies
OMAS
OMRS
	
IGC
PrePostCreate	View
Metadata
Extract	physical
metadata
Manage
Logical
Tables
Virtualizer
Retrievemetadata
Retrievemetadata
Retrievemetadata
Pushmetadata
OracleNetezza
Hive
Tables
Pushandquerymetadata
DataLakeRepositories
Meta
Data
DataLakeVirtualization
tag-sync
rule-sync
Config	(eg		Policies,
Audit	log	locaMon)
LDAP
Audit	Log
Mapper
	Searchfordata/reporting
Pushandquery
metadata
	Meta
Data
Navigator
	Meta
Data
Datameer
GaianDB&Virtualizer
•GaianDB
•OpenSource
•Federated,selflearning,dynamicconfiguration
•BasedonApacheDerby
•Alreadyhad“policy”support–we’replugginginRangerfor
thisproject
•Virtualizer
•Listenstoeventnotificationsonassetsetc
•CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto
storemetadata.Couldusedifferentvirtualengine..
•Designedtobeopentoothervirtualizationtechnologies.
LT1LT2
DS2DS1DS3
Policy
Plugin
(ranger)
VirtualizerAtlas
GaianDBsupportsfederation
–notusedforMVP
Atlas–glossaryenhancements
•GetAtlasclosertoparitywithcommercialofferings
•BusinessTerms–categories,categoryhierarchies
•Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships
•AssetsmappedtoBusinessTerms
•Classifications
•Hierarchy
•Navigablemappingstoretainabilitytoflattentagstoranger
•InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY->
SPI…
•Usedtodrivegovernance
•ATLAS-1410
Atlas–otherenhancements
•ConsumerCentricAPIs
•OpenMetadataAccessServices(OMAS)
•REST&moreKafkanotifications
•Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions,
InformationView,RolesandAccess
•RepositorylevelAPIs
•OpenMetadataRepositoryServices(OMRS)
•REST&moreKafkanotifications
•PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories–
distributedandOpen
•Standarddatamodel/core
•Enhancementtocoremodel–versioning,externallinkageetc
•Morestandardtypesieforallrelationaldatabasestoeasesharing
Rangerareasbeinglookedat
•BuildingapluginforGaianDB
•Accesscontrol,simplemasking.Morelater
•Usersynchronization(large#users,roleofAtlas)
•ChangestotagsyncprocessforNewglossaryproposal
•AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds
ofpolicies.Whereisthemaster?
•Generatingrangerrulesfromgovernancedefinitions
•HowaboutcontrolofaccesstoAtlasitself?
•Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata)
needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas
Atlas
BeyondtheMVP
•OpenDiscoveryFramework
•Considerothersecurityenforcementengines–suchasApacheSentry&driving
morecapabilityaroundrules&governanceactionsfromAtlasmetadata
•Workonstandardmodelstosupportdifferentdomains
•Lineage
•Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph….
•APImetadata
•Infrastructure–JanusGraph…
•AbstractionaddedbyIBMinlastfewmonthsfortitan1
Thevision
•Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage),
owner,structure,meaning,classificationandquality
•Spanningsystemsbothonpremiseandcloudproviders
•Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview
•Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox
•Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository
•Metadataisaddedautomaticallytothecatalogasnewdataiscreated
•Extensiblediscoveryprocessescharacteriseandclassifythedata
•Interestedpartiesandprocessesarenotified
•Subjectmatterexpertscollaboratingaroundthedata
•Locatethedatatheyneed,quicklyandefficiently
•Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand
supporteconomicevaluationofdata
•Automatedgovernanceprocessesprotectandmanageyourdata
•Metadata-drivenaccesscontrol
Summary
•Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha
vibrantecosystemcanevolve
•NotonlyinHadoopbutmorebroadly
•Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter,
andbecompliantwithregulations
•Theideaspresentedhereresonatewithmanypeoplewe’vespokento
•Getinvolved!I’dlovetohearthefeedbackonthisapproach!
•CommentontheJIRAS,askquestions,contribute,disagree…;-)
•LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689
•Atlaswiki
•“Innovationhappensbestnotinisolationbutincollaboration”(keynote)
•THANKS!
Questions
Afterthistalk
jonesn@uk.ibm.com
17:50Room4–Security&GovernanceBOF
z
zzz
z
z
z
Questions?
Backupcharts
Atlas
graphDB
“gaiandb”
IG
C
IGC	REST	API
Oracle
Data
HDFS
Data
Netezza
Data
P-JDBCP-JDBCP-JDBC
GAF	OMAS
Virtual
Asset
OMAS
Search
Search/ExploreUI
Catalog
OMAS
OMR
S
OMR
S
GAF	Pre
GAF	Post
Connector	Framework
*
Atlas	boundaries
Developed	in	POC
May	not	be	in	POC	iniNally
*May	be	hardcoded	at	first
Conne
ctor
Frame
work
ATLAS
Virtualizer
Architecture
Metadataareasandtypes
	
Policy	Metadata	(Principles,
Regula6ons,	Standards,	Approaches,
Rule	Specifica6ons,	Roles	and	Metrics)
Governance
Ac6ons	and
Processes
	
Augmenta6on
Mapping
Implementa6on
Connector	Directories
Access
Access
Informa6on
Auditor
Integra6on
Developer
Business
Analyst
Data
Scien6st
Informa6on
Worker
Informa6on
Owner
Informa6on
Governor
Informa6on
Steward
Data
Quality
Analyst
Business	Objects	and
Rela6onships,	Taxonomies	and
Ontologies
Business	AMributes
Organiza6on
Informa6on
Curator
Teaming	Metadata
(people	profiles,	communi6es,
projects,
notebooks,	…)
Models	and	Schemas
3
2
4
5
Physical	Asset	Descrip6ons
(Data	stores,	APIs,
models	and	components)
Asset	Collec6ons
(Sets,	Typed	Sets,	Type
Organized	Sets)
Informa6on	Views
Rights
Management
Reference	Data
Feedback	Metadata
(tags,	comments,	ra6ngs,	…)
Classifica6on
Schemes
C
l
a
s
s
if
i
c
a
6
o
n
StrategySubject	Area	Defini6on
Campaigns	and	Projects
Infrastructure	and	systems
Rollout
	
1
Discovery
Metadata	(profile	data,	technical
classifica6on,	data	classifica6on,
data	quality	assessment,	…)
Augmenta6on
	
Instrument
Associa6on
Informa6on	Process
Instrumenta6on	(design	lineage)
6
7
User&Group/Rolesynchronization
UserSync2
LDAPholdsrole-membership
(LDAPgroups)–couldalsobe
ActiveDirectory
ATLASmanagesdefinitive
listofroles<thatareusedfor
atlasmanagedsources>
•CorporateLDAPhasahugenumberofusers/groups
•Rangercurrentlyneedstosyncall
•Infutureperhapsweestablishgroup/rolemembership
duringauthentication
•Capabilityforalternativesourcecouldbemergedinto
baseUserSync
LDAPlookup->
group:member
GovernanceActionOMAS
-getRoles
Apache
Ranger
LDAP
ApacheAtlas
AtlasGlossaryv2:TagSynctoRanger
TagSync2
ATLASglossarymanagesa
sophisticatedenterpriseglossary
structure
•AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach
•NewAPIinAtlaswillflattenclassificationstructure
•Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework
GovernanceActionOMAS
Confidential
Salary
emp_renum
Business
Term
HiveColumn
Business
Term
Confidential
emp_renum
HiveColumn
Tag
Apache
Ranger
ApacheAtlas
Policy(Rule)synchronization
RuleSync
•GeneratepoliciesinRangerbasedoffentitiesinAtlas
•Currentlydesigninghowthisworks
•ScopedbypolicyservicesoexistingRangerUIapproachstillworks
GovernanceActionOMAS
-getRules
Role
Classifications
Asset
RangerRule
Action
Apache
RangerApacheAtlas
VirtualDataConnectorJIRAS20170402
•RANGER-
1488
•RANGER-
1487
•RANGER-
1486
•RANGER-
1485
•RANGER-
1464
•RANGER-
1454
•RANGER-
1234
•RANGER-
•CreateRangerpluginforgaiandb
•generaterulesfromGovernancedefinitionsinAtlas
•NewusersyncalternativeforAtlas(vdc)
•RangersupportforVirtualDataConnectorProject(ATLAS)
•SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc)
•SupportofAtlasv2glossaryAPIproposalfortagsource
•Post-evaluationphaseuserextensions
•RangerSource:eclipse
•Adddatamaskingfortagbasedpolicies
•GovernanceActionFrameworkOMAS
•SampleassetstosupportVirtualConnectorProject
•OMASInterfacesforAtlas
•BuildATLASusingDocker
References
•ApacheAtlas-http://atlas.apache.org/
•ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689
•ApacheRanger-http://ranger.apache.org/
•GaianDB
•https://github.com/gaiandb/gaiandb
•https://developer.ibm.com/open/openprojects/gaian-database/
•Thecaseforopenmetadata–A.M.Chessell
•http://www.ibmbigdatahub.com/blog/case-open-metadata

More Related Content

Viewers also liked

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Viewers also liked (12)

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Similar to Unleashing the Power of Apache Atlas with Apache Ranger

Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit userconDanilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit userconVMUG IT
 
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, SparkUnderstanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, SparkExist
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at ScaleEoin Hurrell, PhD
 
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, awsHow KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, awsYu-ching Lin
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...Sri Ambati
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, CollaborationTwo Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, CollaborationInside Analysis
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webDan Delany
 
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoAmazon Web Services
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...Amazon Web Services
 

Similar to Unleashing the Power of Apache Atlas with Apache Ranger (16)

Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit userconDanilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
 
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, SparkUnderstanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, awsHow KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, CollaborationTwo Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, Collaboration
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
 

More from DataWorks Summit/Hadoop Summit

Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Unleashing the Power of Apache Atlas with Apache Ranger