Big Data
Tap into Cloud Infrastructure with FME
March 18, 2014
Meet the presenters.
Don Murray
 President and Co-Founder
@DonAtSafe
Dean Hintz
 Senior Product Specialist
@DeanAtSafe
Ask us. And join the discussion.
Please submit using the
GoToWebinar panel.
We will follow-up with
unanswered questions.
Agenda.
 What is Big Data
 Big Data Challenges
 FME and Big Data
 FME Demos:
 Loading and Extracting from
MarkLogic
 Spatial Indexing and Loading to
DynamoDB
What we do.
www.safe.com
Poll: Which version of FME
are you using?
New to FME?
 Get your bearings from our Getting
Started Page:
www.safe.com/fme/getting-started
 Learn from our crew in one of the
weekly FME Overview webinars:
safe.com/WeeklyIntro
What is Big Data?
Big Data and Cloud
Big Data needs big resources
 Big datastores
 Big processing power
 Big bandwidth
Cloud technology gives you this for fraction of
traditional cost!
Big Data and FME
 Big Data is a new data
“classification” for FME.
 Big Data is no different than
other data to FME
 FME Cloud is a natural fit for
data in the Cloud
FME makes it easy to leverage the power of Big Data.
Big Data and FME Support
Amazon S3
 Limitless internet based
storage
Amazon RDS
 See blog article on Amazon RDS (PostGIS)
Amazon DynamoDB
 NoSQL limitless database service
Amazon RedShift
 Petabyte scale database warehouse service.
Google BigQuery
 Superfast append only tables
MarkLogic
 Large XML based database
Poll: How are you currently
working with Big Data?
Big Data Challenges
 Loading Data
 Lacks spatial support
 Big Data Analysis
 Querying and Exporting
Data
Demo #1
 MarkLogic
Demo #2
 Limitless Spatial
Database
Why Demo FME with
MarkLogic and DynamoDB?
Different from other
databases supported by
FME.
What is ?
 NoSQL database – XML optimized
 Powerful search and analysis
 Native Spatial Support
 XML based data model (GML, XML, etc.)
 Deploy on Hadoop HDFS
FME and MarkLogic – A Natural Fit
 Convert data to XML/GML*
 Easily Load XML into MarkLogic with FME
 Process and convert XML results
 FME 2014: New schema based GML Writer
Demo #1a Loading MarkLogic
Convert GIS / CAD
data to GML (XML)
Compose REST request
to PUT to MarkLogic
database
1. Convert GIS / CAD data into Valid GML
2.Generate Key Fields
3. Build insert message
4. Execute PUT REST call
MarkLogic accepts any valid XML – just PUT it!
Loading GIS to MarkLogic
Loading GIS to MarkLogic with FME
What ​Big Data technology
are you most interested in?
Demo #1b Exporting from MarkLogic
GET Query to find
URI’s for features
of interest
GET Query using URI’s to
get feature XML/GML,
then
Conversion to format of
choice (CAD, GIS …)
/WFS
Exporting XML from MarkLogic
1. Query database via GET request
2. Parse search result and compose GET feature request
3. Extract attributes and geometry from result
4. Validate and Write XML Result
Exporting XML from MarkLogic
Search GET request:
http://localhost:8003/v1/keyvalue?element=comment&value=AIXM.Chicago
Retrieval GET request:
http://localhost:8003/v1/documents?uri=/docs/myXML_653c46c3-fdfb-4837-ae1c-
49735dd29356.xml
AIXM from MarkLogic via FMEServer
http://UHURA/fmedatastreaming/Demos/QueryMarkLogicDB.fmw
?Element=airportCode&Value=CYVR
/AIXM
AIXM from MarkLogic via FMEServer
MarkLogic -> Anything
(JSON, KML, GML …)
MarkLogic to ArcGIS via FME Server:
1. Submit search to MarkLogic as described earlier
2. Extract attributes and geometry from result
3. Generate update ESRIJSON message from feature
4. Post update ESRIJSON to ArcGIS Server
MarkLogic / ArcGIS Integration
ArcGIS Server to MarkLogic
via FME Server
1. Retrive JSON data from ArcGIS Server
2. Generate output GML
3. Write data to MarkLogic via PUT REST call
ArcGIS Server to MarkLogic
Demo #2 – Limitless Spatial Database
DynamoDB
 NoSQL SSD-based database service
 No limit on size of Database
 Specify the needed performance
 Autoscale thru Dynamic DynamoDB
 Amazon EMR (Hadoop) integration
Demo # 2 – Index Strategy
Generate GeoHash Index
for each feature and
Write to
GeoHashSpatialIndex
Demo #2a – Vector, Raster, Lidar
Write small features
to DynamoDB
Write large features
to Amazon S3, link
to DynamoDB
Demo #2b – Geocoded Images
Generate Geohash record
of picture location
Write Image to S3, link
to DynamoDB
Demo #2c – Spatially Store Anything
Generate Geohash
index
Write Document to
S3 and Link to
DynamoDB
location
Demo #2d – Spatially Locate any
internet resource
Write URI Link to
DynamoDB
Generate Geohash
index
location
What data types are you
planning to store in Big Data?
Save the date.
Webinar: How to Automate Practically
Anything with FME Server (March 25th)
Webinar: How to Load Data into Google
Maps Engine (April 16th)
FME World Tour 2014 (April – June 2014)
FME International User Conference 2014
(20th Anniversary Celebration)
• June 10 – 13, 2014 in Vancouver, Canada
Free and fun to learn.
Online Courses - Live & Hands-On
 Feb 18-19: FME Desktop
Tutorials & Recorded Courses
Stay informed.
fmepedia.com/community
fmepedia.com/knowledge
@SafeSoftware
youtube.com/FMEChannel
blog.safe.com
Summary
Big Data = big new opportunities
FME great for working with Big Data
Cloud model is a natural fit for Big Data
This is just the beginning - more to come!
Hand raising has now
been enabled.
 If you’d like to ask a
question over the
air, please click the
hand icon and
ensure your audio
input is set up.
Thank you!
Sales
 info@safe.com
Support
 www.safe.com/support
 (604) 501-9985 ext. 278
Don Murray
 Don.murray@safe.com
Dean Hintz
 dean@safe.com

Big Data – Tap into Cloud Infrastructure with FME

  • 1.
    Big Data Tap intoCloud Infrastructure with FME March 18, 2014
  • 2.
    Meet the presenters. DonMurray  President and Co-Founder @DonAtSafe Dean Hintz  Senior Product Specialist @DeanAtSafe
  • 3.
    Ask us. Andjoin the discussion. Please submit using the GoToWebinar panel. We will follow-up with unanswered questions.
  • 4.
    Agenda.  What isBig Data  Big Data Challenges  FME and Big Data  FME Demos:  Loading and Extracting from MarkLogic  Spatial Indexing and Loading to DynamoDB
  • 6.
  • 7.
    Poll: Which versionof FME are you using?
  • 8.
    New to FME? Get your bearings from our Getting Started Page: www.safe.com/fme/getting-started  Learn from our crew in one of the weekly FME Overview webinars: safe.com/WeeklyIntro
  • 9.
  • 10.
    Big Data andCloud Big Data needs big resources  Big datastores  Big processing power  Big bandwidth Cloud technology gives you this for fraction of traditional cost!
  • 11.
    Big Data andFME  Big Data is a new data “classification” for FME.  Big Data is no different than other data to FME  FME Cloud is a natural fit for data in the Cloud FME makes it easy to leverage the power of Big Data.
  • 12.
    Big Data andFME Support Amazon S3  Limitless internet based storage Amazon RDS  See blog article on Amazon RDS (PostGIS) Amazon DynamoDB  NoSQL limitless database service Amazon RedShift  Petabyte scale database warehouse service. Google BigQuery  Superfast append only tables MarkLogic  Large XML based database
  • 13.
    Poll: How areyou currently working with Big Data?
  • 14.
    Big Data Challenges Loading Data  Lacks spatial support  Big Data Analysis  Querying and Exporting Data
  • 15.
    Demo #1  MarkLogic Demo#2  Limitless Spatial Database
  • 16.
    Why Demo FMEwith MarkLogic and DynamoDB? Different from other databases supported by FME.
  • 17.
    What is ? NoSQL database – XML optimized  Powerful search and analysis  Native Spatial Support  XML based data model (GML, XML, etc.)  Deploy on Hadoop HDFS
  • 18.
    FME and MarkLogic– A Natural Fit  Convert data to XML/GML*  Easily Load XML into MarkLogic with FME  Process and convert XML results  FME 2014: New schema based GML Writer
  • 19.
    Demo #1a LoadingMarkLogic Convert GIS / CAD data to GML (XML) Compose REST request to PUT to MarkLogic database
  • 20.
    1. Convert GIS/ CAD data into Valid GML 2.Generate Key Fields 3. Build insert message 4. Execute PUT REST call MarkLogic accepts any valid XML – just PUT it! Loading GIS to MarkLogic
  • 21.
    Loading GIS toMarkLogic with FME
  • 22.
    What ​Big Datatechnology are you most interested in?
  • 23.
    Demo #1b Exportingfrom MarkLogic GET Query to find URI’s for features of interest GET Query using URI’s to get feature XML/GML, then Conversion to format of choice (CAD, GIS …) /WFS
  • 24.
    Exporting XML fromMarkLogic 1. Query database via GET request 2. Parse search result and compose GET feature request 3. Extract attributes and geometry from result 4. Validate and Write XML Result
  • 25.
    Exporting XML fromMarkLogic Search GET request: http://localhost:8003/v1/keyvalue?element=comment&value=AIXM.Chicago Retrieval GET request: http://localhost:8003/v1/documents?uri=/docs/myXML_653c46c3-fdfb-4837-ae1c- 49735dd29356.xml
  • 26.
    AIXM from MarkLogicvia FMEServer http://UHURA/fmedatastreaming/Demos/QueryMarkLogicDB.fmw ?Element=airportCode&Value=CYVR /AIXM
  • 27.
    AIXM from MarkLogicvia FMEServer
  • 28.
  • 29.
    MarkLogic to ArcGISvia FME Server: 1. Submit search to MarkLogic as described earlier 2. Extract attributes and geometry from result 3. Generate update ESRIJSON message from feature 4. Post update ESRIJSON to ArcGIS Server MarkLogic / ArcGIS Integration
  • 30.
    ArcGIS Server toMarkLogic via FME Server 1. Retrive JSON data from ArcGIS Server 2. Generate output GML 3. Write data to MarkLogic via PUT REST call
  • 31.
  • 32.
    Demo #2 –Limitless Spatial Database
  • 33.
    DynamoDB  NoSQL SSD-baseddatabase service  No limit on size of Database  Specify the needed performance  Autoscale thru Dynamic DynamoDB  Amazon EMR (Hadoop) integration
  • 34.
    Demo # 2– Index Strategy Generate GeoHash Index for each feature and Write to GeoHashSpatialIndex
  • 35.
    Demo #2a –Vector, Raster, Lidar Write small features to DynamoDB Write large features to Amazon S3, link to DynamoDB
  • 36.
    Demo #2b –Geocoded Images Generate Geohash record of picture location Write Image to S3, link to DynamoDB
  • 37.
    Demo #2c –Spatially Store Anything Generate Geohash index Write Document to S3 and Link to DynamoDB location
  • 38.
    Demo #2d –Spatially Locate any internet resource Write URI Link to DynamoDB Generate Geohash index location
  • 39.
    What data typesare you planning to store in Big Data?
  • 40.
    Save the date. Webinar:How to Automate Practically Anything with FME Server (March 25th) Webinar: How to Load Data into Google Maps Engine (April 16th) FME World Tour 2014 (April – June 2014) FME International User Conference 2014 (20th Anniversary Celebration) • June 10 – 13, 2014 in Vancouver, Canada
  • 41.
    Free and funto learn. Online Courses - Live & Hands-On  Feb 18-19: FME Desktop Tutorials & Recorded Courses
  • 42.
  • 43.
    Summary Big Data =big new opportunities FME great for working with Big Data Cloud model is a natural fit for Big Data This is just the beginning - more to come!
  • 44.
    Hand raising hasnow been enabled.  If you’d like to ask a question over the air, please click the hand icon and ensure your audio input is set up.
  • 45.
    Thank you! Sales  info@safe.com Support www.safe.com/support  (604) 501-9985 ext. 278 Don Murray  Don.murray@safe.com Dean Hintz  dean@safe.com

Editor's Notes

  • #10 Video plays here - what is big dataFuzzy term sort of like “cloud”. What does big data look like?As a catch-all term, “big data” can be pretty nebulous, in the same way that the term “cloud” covers diverse technologies. Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. Are these all really the same thing? To clarify matters, the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data. They’re a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them. Most probably you will contend with each of the Vs to one degree or another.
  • #12 Big data holds all of it
  • #14  - on premise - cloud (amazon web services) - cloud (google) - cloud (other) - not currently using Big Data
  • #15 Loading DataConversion: big data not spatial friendly (CAD, GIS)Expensive to upload / downloadGeoreferencing and spatial indexingmost big data repositories have limited geospatialBig Data AnalysisQuerying and Exporting DataTricky to find and access stored dataNeed to generate appropriate keys on load
  • #16 Loading DataConversion: big data not spatial friendly (CAD, GIS)Expensive to upload / downloadGeoreferencing and spatial indexingmost big data repositories have limited geospatialBig Data AnalysisQuerying and Exporting DataTricky to find and access stored dataNeed to generate appropriate keys on load
  • #17 Loading DataConversion: big data not spatial friendly (CAD, GIS)Expensive to upload / downloadGeoreferencing and spatial indexingmost big data repositories have limited geospatialBig Data AnalysisQuerying and Exporting DataTricky to find and access stored dataNeed to generate appropriate keys on load
  • #18 Big data repository – scale as big as you wantNoSQL database – optimized for XML / GMLPowerful search and analysis (BI, semantic queries)Stores location, not just geohashXML based data model – rapid XML exportStore any documents: GML, XML (metadata)Deploy on Hadoop HDFS
  • #19 * As applicable (e.g. cant convert raster to gml!)FME2014’s new schema based GML writer which allows FME to convert almost any CAD / GIS or even BIM data to GML or CityGML. This makes FME a very powerful loader tool for MarkLogicFME - A Natural Fit to support MarkLogic:Converts almost any spatial data to GMLWrite almost any XML with XMLTemplaterLoading XML into MarkLogic is a simple HTTP PUT operation easily done with HTTPUploaderQuery, process and reconvert XML results
  • #21 Converting features to GML/XML usually involves a GeometryExtractor transformer or some combination of CoordinateExtractor and XMLTemplaterKey fields can be captured from the source data or use UUIDGenerator to generate unique IDs for URIs etc.Build insert message with XMLTemplaterExecute REST PUT call with HTTPUploader
  • #22 Converting features to GML/XML usually involves a GeometryExtractor transformer or some combination of CoordinateExtractor and XMLTemplaterKey fields can be captured from the source data or use UUIDGenerator to generate unique IDs for URIs etc.Build insert message with XMLTemplaterExecute REST PUT call with HTTPUploader<?xml version="1.0" encoding="UTF-8"?><xml><docID>{fme:get-attribute("_uuid")}</docID><docAuthor>{fme:get-attribute("user")}</docAuthor><modType>{fme:get-attribute("updateType")}</modType><UpdateDate>{fme:get-attribute("_timestamp")}</UpdateDate><filePath>{fme:get-attribute("filePath")}</filePath><comment>{fme:get-attribute("comment")}</comment><doc_xml>{fme:get-xml-attribute("_file_contents")}</doc_xml></xml>
  • #23 As simple as 1,2,3,4!
  • #24  - on premise - cloud (amazon web services) - cloud (google) - cloud (other) - not currently using Big Data
  • #25 * need bubble here for XML/WFS – maybe a circle with something like this in it:<gml:featureMember> <gn:NamedPlacegml:id=“abc.123"> <gn:geometry> <gml:Pointgml:id=“p.abc.123" srsName="EPSG:4258"><gml:pos>15.2 36.7</gml:pos> </gml:Point> </gn:geometry>…
  • #27 This workspace can support the retrieval of any type of XML/GML regardless of schema. The same query workspace can be used to retrieve AIXM, INSPIRE or any other type of XML/GML.StringConcatenator composes search GET request based on input parametersHTTPFetcher sends search GET request to MarkLogicXMLFlattener flattens the response so result.uri can be exposedSecond StringConcatenatorcomposes document GET request based on matching URISecond HTTPFetcher sends document retrieval GET request to MarkLogicXMLFragmenter pulls out the doc_xml from the MarkLogic responseXML writer outputs the XML as a file or streams it to the FMEServer client once workspace is publishedSearch GET request to find URI based on query:http://localhost:8003/v1/keyvalue?element=comment&value=AIXM.ChicagoDocument Retrieval GET request based on URI:http://localhost:8003/v1/documents?uri=/docs/myXML_653c46c3-fdfb-4837-ae1c-49735dd29356.xml
  • #28 For this demo the previous workspace was published to FME Server to make a feature service hosted by FMEServer on top of MarkLogic. The example here supports a simple REST based XML data stream.We could easily use this approach to build a FMEServer hosted WFS on top of MarkLogic.
  • #29 This demo shows Inspector reading AIXM5 GML directly from the GET query: http://UHURA/fmedatastreaming/Demos/QueryMarkLogicDB.fmw?Element=airportCode&Value=CYVRThe query goes to FMEServer’s data streaming serviceFMEServer uses the URL parameters to run the published QueryMarkLogicDB.fmw workspace.QueryMarkLogicDB.fmw uses the values of Element and Value to build a search request and send that to MarkLogicQueryMarkLogicDB.fmw uses the URI from MarkLogic’s search result to compose and submit a document request to MarkLogicQueryMarkLogicDB.fmw extracts the feature XML from the MarkLogic’s document response and streams it back to the FMEServer client
  • #30 This just shows how FME can read XML from MarkLogic and use the GeometryReplacer to covert it to virtually any format FME supports
  • #31 Shows how FME can be used to integrateMarkLogic and ArcGIS Server.These are the steps to move data from MarkLogic to Arc Server Feature Service
  • #32 Shows how FME can be used to integrateMarkLogic and ArcGIS Server.These are the steps to move data from Arc Server Feature Service to MarkLogic. Note this workflow could be event driven, real time or as a scheduled update.
  • #33 Workspace showing data flow from ArcServer toMarkLogic. REST call to feature service retrieves the feature of interest.JSON is extracted and GeometryReplacer generates an FME geometry from it.GeometryExtractor renders the FME geometry as GMLGML is added to an XML update message and posted to MarkLogic
  • #34 Demo #2 Limitless Spatial Indexed Database:Geohash spatial indexStore Vector DataStore Raster DataStore Lidar DataStore geotagged images by locationStore and associate any document with a location
  • #41  - on premise - cloud (amazon web services) - cloud (google) - cloud (other) - not currently using Big Data