 jeffvand
|
I am having someone look at expanding the content selection options in the HTML importer tool. Below is what I have asked them to do. Do you see any problems with this. If we get it working would you like to add it to the plugin for others? Any tips?
********
What I’d like to do is look at adding some more options at the top of the Content page where you can “Select Content by:”
What I am thinking of adding items to the HTML tag import that allows us to further define categories and that will run in addition to just looking for the HTML tag. For example.
“Exclude Tag:” – This would allow us to exclude certain tags within the main tag that we don’t want in the import. It should have 3 slots you could put something in to exclude up to three tags.
“Between HTML:” – This would give two fields where we could put in straight html strings that the parser could look for and then find all content within it.
All of these would be a “If exists, then run when extracting content, then move to second one. If it doesn’t exist then it could just ignore it and move to the second paramater”.
This is a page we are looking at migrating: http://equalopportunity-ada.unc.edu/eo-reports/index.htm. I think we could then do it like this in the import:
Example: 1
==============
Get Tag:
HTML Tag: div Attribute: id = Value: content_c
Exclude Tag:
HTML Tag: div Attribute: class = Value: tools
HTML tag: div Attribute: id = Value: breadcrumbs
Result: this would get all the content including the title in the content, so that may not be perfect, but it would probably work and with only 50 pages we could clean up.
==========
Or Example 2 might work better (only the content and no title, since that would be gotten in another step):
Get Tag:
HTML Tag: div Attribute: id = Value: content_c
Between HTML:
Start: HTML snippet (can’t post online in editor, but you can see it in the browser)
End: HTML snippet
|