Package org.htmlcleaner
Class CleanerProperties
java.lang.Object
org.htmlcleaner.CleanerProperties
- All Implemented Interfaces:
HtmlModificationListener
Properties defining cleaner's behaviour
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.voidaddPruneTagNodeCondition(ITagNodeCondition condition) Adds the condition to existing prune tag set.voidfireConditionModification(ITagNodeCondition condition, TagNode tagNode) Fired when cleaner modifies html due toITagNodeConditionmatch.voidfireHtmlError(boolean certainty, TagNode startTagToken, ErrorType type) Fired when cleaner fixes some error in html syntax.voidfireUglyHtml(boolean certainty, TagNode startTagToken, ErrorType errorType) Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code.voidfireUserDefinedModification(boolean certainty, TagNode tagNode, ErrorType errorType) Fired when cleaner modifies html due to user specified rules.intReturn the html versionGet the prefix to use to try to make valid attribute namesintbooleanbooleanbooleanbooleanIf false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters.booleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanisUseCdataFor(String useCdataFor) booleanbooleanvoidreset()advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000voidsetAddNewlineToHeadAndBody(boolean addNewlineToHeadAndBody) voidsetAdvancedXmlEscape(boolean advancedXmlEscape) voidsetAllowHtmlInsideAttributes(boolean allowHtmlInsideAttributes) voidsetAllowInvalidAttributeNames(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit themvoidsetAllowMultiWordAttributes(boolean allowMultiWordAttributes) voidsetAllowTags(String allowTags) voidsetBooleanAttributeValues(String booleanAttributeValues) voidsetCharset(String charset) voidsetCleanerTransformations(CleanerTransformations cleanerTransformations) voidsetDeserializeEntities(boolean deserializeEntities) voidsetHtmlVersion(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.voidsetHyphenReplacementInComment(String hyphenReplacementInComment) voidsetIgnoreQuestAndExclam(boolean ignoreQuestAndExclam) voidsetInvalidXmlAttributeNamePrefix(String invalidXmlAttributePrefix) Sets the prefix to use for xml attributes that are invalidvoidsetKeepWhitespaceAndCommentsInHead(boolean keepHeadWhitespace) voidsetMaxDepth(int maxDepth) voidsetNamespacesAware(boolean namespacesAware) voidsetOmitCdataOutsideScriptAndStyle(boolean value) voidsetOmitComments(boolean omitComments) voidsetOmitDeprecatedTags(boolean omitDeprecatedTags) voidsetOmitDoctypeDeclaration(boolean omitDoctypeDeclaration) voidsetOmitHtmlEnvelope(boolean omitHtmlEnvelope) voidsetOmitUnknownTags(boolean omitUnknownTags) voidsetOmitXmlDeclaration(boolean omitXmlDeclaration) voidsetPruneTags(String pruneTags) Resets prune tags set and adds tag name conditions to it.voidsetRecognizeUnicodeChars(boolean recognizeUnicodeChars) voidsetTranslateSpecialEntities(boolean translateSpecialEntities) TODO : useOptionalOutputvoidsetTransResCharsToNCR(boolean transResCharsToNCR) voidsetTransSpecialEntitiesToNCR(boolean transSpecialEntitiesToNCR) voidsetTreatDeprecatedTagsAsContent(boolean treatDeprecatedTagsAsContent) voidsetTreatUnknownTagsAsContent(boolean treatUnknownTagsAsContent) voidsetTrimAttributeValues(boolean trimAttributeValues) voidsetUseCdataFor(String useCdataFor) voidsetUseCdataForScriptAndStyle(boolean useCdataForScriptAndStyle) voidsetUseEmptyElementTags(boolean useEmptyElementTags)
-
Field Details
-
DEFAULT_CHARSET
- See Also:
-
BOOL_ATT_SELF
- See Also:
-
BOOL_ATT_EMPTY
- See Also:
-
BOOL_ATT_TRUE
- See Also:
-
-
Constructor Details
-
CleanerProperties
public CleanerProperties() -
CleanerProperties
- Parameters:
tagInfoProvider-
-
-
Method Details
-
getMaxDepth
public int getMaxDepth() -
setMaxDepth
public void setMaxDepth(int maxDepth) -
getTagInfoProvider
-
isAdvancedXmlEscape
public boolean isAdvancedXmlEscape() -
setAdvancedXmlEscape
public void setAdvancedXmlEscape(boolean advancedXmlEscape) -
isTransResCharsToNCR
public boolean isTransResCharsToNCR() -
setTransResCharsToNCR
public void setTransResCharsToNCR(boolean transResCharsToNCR) -
isUseCdataForScriptAndStyle
public boolean isUseCdataForScriptAndStyle() -
setUseCdataForScriptAndStyle
public void setUseCdataForScriptAndStyle(boolean useCdataForScriptAndStyle) -
setUseCdataFor
-
getUseCdataFor
-
isUseCdataFor
-
isTranslateSpecialEntities
public boolean isTranslateSpecialEntities() -
setTranslateSpecialEntities
public void setTranslateSpecialEntities(boolean translateSpecialEntities) TODO : useOptionalOutput- Parameters:
translateSpecialEntities-
-
isRecognizeUnicodeChars
public boolean isRecognizeUnicodeChars() -
setRecognizeUnicodeChars
public void setRecognizeUnicodeChars(boolean recognizeUnicodeChars) -
isOmitUnknownTags
public boolean isOmitUnknownTags() -
setOmitUnknownTags
public void setOmitUnknownTags(boolean omitUnknownTags) -
isTreatUnknownTagsAsContent
public boolean isTreatUnknownTagsAsContent() -
setTreatUnknownTagsAsContent
public void setTreatUnknownTagsAsContent(boolean treatUnknownTagsAsContent) -
isOmitDeprecatedTags
public boolean isOmitDeprecatedTags() -
setOmitDeprecatedTags
public void setOmitDeprecatedTags(boolean omitDeprecatedTags) -
isTreatDeprecatedTagsAsContent
public boolean isTreatDeprecatedTagsAsContent() -
setTreatDeprecatedTagsAsContent
public void setTreatDeprecatedTagsAsContent(boolean treatDeprecatedTagsAsContent) -
isOmitComments
public boolean isOmitComments() -
setOmitComments
public void setOmitComments(boolean omitComments) -
isOmitXmlDeclaration
public boolean isOmitXmlDeclaration() -
setOmitXmlDeclaration
public void setOmitXmlDeclaration(boolean omitXmlDeclaration) -
isOmitDoctypeDeclaration
public boolean isOmitDoctypeDeclaration()- Returns:
- also return true if omitting the Html Envelope
-
setOmitDoctypeDeclaration
public void setOmitDoctypeDeclaration(boolean omitDoctypeDeclaration) -
isOmitHtmlEnvelope
public boolean isOmitHtmlEnvelope() -
setOmitHtmlEnvelope
public void setOmitHtmlEnvelope(boolean omitHtmlEnvelope) -
isUseEmptyElementTags
public boolean isUseEmptyElementTags() -
setUseEmptyElementTags
public void setUseEmptyElementTags(boolean useEmptyElementTags) -
isAllowMultiWordAttributes
public boolean isAllowMultiWordAttributes() -
setAllowMultiWordAttributes
public void setAllowMultiWordAttributes(boolean allowMultiWordAttributes) -
isAllowHtmlInsideAttributes
public boolean isAllowHtmlInsideAttributes() -
setAllowHtmlInsideAttributes
public void setAllowHtmlInsideAttributes(boolean allowHtmlInsideAttributes) -
isIgnoreQuestAndExclam
public boolean isIgnoreQuestAndExclam() -
setIgnoreQuestAndExclam
public void setIgnoreQuestAndExclam(boolean ignoreQuestAndExclam) -
isNamespacesAware
public boolean isNamespacesAware() -
setNamespacesAware
public void setNamespacesAware(boolean namespacesAware) -
isAddNewlineToHeadAndBody
public boolean isAddNewlineToHeadAndBody() -
setAddNewlineToHeadAndBody
public void setAddNewlineToHeadAndBody(boolean addNewlineToHeadAndBody) -
isKeepWhitespaceAndCommentsInHead
public boolean isKeepWhitespaceAndCommentsInHead() -
setKeepWhitespaceAndCommentsInHead
public void setKeepWhitespaceAndCommentsInHead(boolean keepHeadWhitespace) -
getHyphenReplacementInComment
-
setHyphenReplacementInComment
-
getPruneTags
-
isOmitCdataOutsideScriptAndStyle
public boolean isOmitCdataOutsideScriptAndStyle() -
setOmitCdataOutsideScriptAndStyle
public void setOmitCdataOutsideScriptAndStyle(boolean value) -
isDeserializeEntities
public boolean isDeserializeEntities() -
setDeserializeEntities
public void setDeserializeEntities(boolean deserializeEntities) -
setHtmlVersion
public void setHtmlVersion(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.- Parameters:
version- Number 4 for html4 or 5 for html5
-
getHtmlVersion
public int getHtmlVersion()Return the html version- Returns:
- int The html version
-
isTrimAttributeValues
public boolean isTrimAttributeValues() -
setTrimAttributeValues
public void setTrimAttributeValues(boolean trimAttributeValues) -
setPruneTags
Resets prune tags set and adds tag name conditions to it. All the tags listed by pruneTags param are added.- Parameters:
pruneTags-
-
addPruneTagNodeCondition
Adds the condition to existing prune tag set.- Parameters:
condition-
-
getPruneTagSet
-
getAllowTags
-
setAllowTags
-
isTransSpecialEntitiesToNCR
public boolean isTransSpecialEntitiesToNCR() -
setTransSpecialEntitiesToNCR
public void setTransSpecialEntitiesToNCR(boolean transSpecialEntitiesToNCR) -
getAllowTagSet
-
setCharset
- Parameters:
charset- the charset to set
-
getCharset
- Returns:
- the charset
-
getBooleanAttributeValues
-
setBooleanAttributeValues
-
reset
public void reset()advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000 -
getCleanerTransformations
- Returns:
- the cleanerTransformations
-
setCleanerTransformations
-
addHtmlModificationListener
Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.- Parameters:
listener- -- listener object to be notified of the changes.
-
fireConditionModification
Description copied from interface:HtmlModificationListenerFired when cleaner modifies html due toITagNodeConditionmatch.- Specified by:
fireConditionModificationin interfaceHtmlModificationListener- Parameters:
condition- that was applied to make the modificationtagNode- - problematic node.
-
fireHtmlError
Description copied from interface:HtmlModificationListenerFired when cleaner fixes some error in html syntax.- Specified by:
fireHtmlErrorin interfaceHtmlModificationListener- Parameters:
certainty- - true if change made doesn't hurts end document.startTagToken- - problematic node.
-
fireUglyHtml
Description copied from interface:HtmlModificationListenerFired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code. For example when deprecated tags are removed.- Specified by:
fireUglyHtmlin interfaceHtmlModificationListener- Parameters:
certainty- - true if change made doesn't hurts end document.startTagToken- - problematic node.
-
fireUserDefinedModification
Description copied from interface:HtmlModificationListenerFired when cleaner modifies html due to user specified rules.- Specified by:
fireUserDefinedModificationin interfaceHtmlModificationListener- Parameters:
certainty- - true if change made doesn't hurts end document.tagNode- - problematic node.
-
getInvalidXmlAttributeNamePrefix
Get the prefix to use to try to make valid attribute names- Returns:
- invalidAttributeNamePrefix
-
setInvalidXmlAttributeNamePrefix
Sets the prefix to use for xml attributes that are invalid- Parameters:
invalidXmlAttributePrefix- the prefix to use
-
setAllowInvalidAttributeNames
public void setAllowInvalidAttributeNames(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit them- Parameters:
allowInvalidAttributeNames- True if invalid attributes allowed
-
isAllowInvalidAttributeNames
public boolean isAllowInvalidAttributeNames()If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters. Otherwise, omit invalid attributes- Returns:
- True if invalid attribute names are allowed.
-