Towards handling Import[xxx, "ZIP", member-name]#1846
Conversation
Also go over a couple of SystemFiles/Formats Better align "nffil" message with WMA
Align better with WMA, go over Formats/Image code Small corrections. In pytest stdout, if got == expected we don't need to print both, one is sufficient along the the fact that the test passed.
This simplifies deleting (temporary) files and picking out the file extension in implementing Import as well as in other builtins.
This is too complicated because it handles too many disparate forms. pylint thinks this function is too complex too. Instead, divide it up, and improve it.
Now we just gotta get ImportString[xxx, "JSON"] working on its own. Also, other small fixes and improvements.
| """%(name)s[text_String, element_]""" | ||
| # FIXME?: right now we aren't using element. Things might be | ||
| # more efficient if we used element? | ||
| return self.eval(text, evaluation) |
There was a problem hiding this comment.
@mmatera Is there a better way to combine this with the def eval() above?
Even if that is the case, it might be useful to have this broken out as a stub for when this is revised to be able to handle the element passed.
There was a problem hiding this comment.
This is not using the parameter element. In a proper implementation, this function should be more general than self.eval. Also, parse_html should have an extra attribute to filter a given element.
There was a problem hiding this comment.
This is not using the parameter
element.
That is exactly what FIXME says.
In a proper implementation, this function should be more general than
self.eval. Also,parse_htmlshould have an extra attribute to filter a given element.
Yep. Revising HTML and XML is left for later. I will be happy when we are able to "Import" and extract a JSON file from a ZIP import which is needed for being able to install paclets from the public paclet server.
This is the main reason why any work on this is currently being done.
There was a problem hiding this comment.
OK, but the output with the second element produce something different.
In[8]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Elements"}]
Out[8]= {Data, FullData, Hyperlinks, ImageLinks, Images, Plaintext, Source,
> Title, XMLObject}
In[9]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Title"}]
Out[9]= Dto. de Fisica | Facultad de Cs. Exactas | UNLP
In[10]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"FullData"}]
Out[10]= {{{}, {}, {}}, {{}, {}, {}, {}},
> {{}, {{ }, { CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062 Fax:\
> +54-221-4252006 - secre@fisica.unlp.edu.ar }}}, {{}},
> {{}, {}, {}, {}, {}, {}, {}, {}, {}, {}}, {}}
In[11]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Plaintext"}]
Out[11]= Departamento de Física
Facultad de Ciencias Exactas , Universidad Nacional de La Plata
CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
Fax: +54-221-4252006 - secre@fisica.unlp.edu.ar
Ciclo anual de charlas para alumnos
Ingreso a Webmail
AFA Filial La Plata
Biblioteca y Hemeroteca
Museo de Física
Instituto de Física La Plata
In[12]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"XMLObject"}]
Out[12]= XMLObject[Document][{XMLObject[Declaration][Version -> 1.0,
> Standalone -> yes]}, XMLElement[html,
> {{http://www.w3.org/2000/xmlns/, xmlns} ->
> http://www.w3.org/1999/xhtml},
> {XMLElement[head, {}, {XMLElement[link,
> {rel -> stylesheet, href -> dropdown.css, type -> text/css}, {}],
> XMLElement[script, {language -> JavaScript1.2, src -> menu_data.js},
> {}], XMLElement[title, {},
> {Dto. de Fisica | Facultad de Cs. Exactas | UNLP}]}],
> XMLElement[body, {link -> #428266, vlink -> #428266,
> bgcolor -> #ffffff}, {,
> XMLElement[font, {face -> verdana,arial,helvetica, size -> -1},
> {, XMLElement[font, {size -> +2}, {Departamento de Física}],
> XMLElement[br, {clear -> none}, {}], ,
> XMLElement[a, {shape -> rect,
> href -> http://www.exactas.unlp.edu.ar},
> {Facultad de Ciencias Exactas}], ,,
> XMLElement[a, {shape -> rect, href -> http://www.unlp.edu.ar},
> {Universidad Nacional de La Plata}], }],
> XMLElement[p, {}, {, XMLElement[script,
> {language -> JavaScript1.2, src -> menu_script.js}, {}], ,
> XMLElement[table, {border -> 0},
> {XMLElement[tr, {},
> {XMLElement[td, {colspan -> 1, rowspan -> 1},
> { , XMLElement[img, {src -> fisica.jpg}, {}], }]}],
> XMLElement[tr, {},
> {XMLElement[td, {colspan -> 1, rowspan -> 1},
> {, XMLElement[center, {},
> {, XMLElement[font, {size -> -3},
> { ,
CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
> XMLElement[br, {clear -> none}, {}],
> ,
Fax: +54-221-4252006 -
> XMLElement[a,
> {shape -> rect,
> href -> mailto:secre@fisica.unlp.edu.ar},
> {secre@fisica.unlp.edu.ar}], }]}]}]}]}], }],
> XMLElement[p, {}, {, XMLElement[a,
> {shape -> rect, href -> semin/alumnos},
> {Ciclo anual de charlas para alumnos}], }],
> XMLElement[p, {}, {, XMLElement[a,
> {shape -> rect, href -> http://mail.fisica.unlp.edu.ar},
> {Ingreso a Webmail}], , XMLElement[br, {clear -> none}, {}], ,
> XMLElement[a, {shape -> rect,
> href -> http://www2.fisica.unlp.edu.ar/filial/},
> {AFA Filial La Plata}], , XMLElement[br, {clear -> none}, {}], ,
> XMLElement[a, {shape -> rect,
> href -> http://biblio.fisica.unlp.edu.ar/},
> {Biblioteca y Hemeroteca}], ,
> XMLElement[br, {clear -> none}, {}], ,
> XMLElement[a, {shape -> rect,
> href -> http://museofisica.exactas.unlp.edu.ar/},
> {Museo de Física}], XMLElement[br, {clear -> none}, {}], ,
> XMLElement[a, {shape -> rect,
> href -> http://iflp.fisica.unlp.edu.ar/},
> {Instituto de Física La Plata}],
> XMLElement[br, {clear -> none}, {}], }], XMLElement[p, {}, {}]}]}],
> {}]
In the Mathics3 master branch, this seems to work:
In[1]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Title"}]
Out[1]= "Dto. de Fisica | Facultad de Cs. Exactas | UNLP"
In[2]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Plaintext"}]
Out[2]= "Dto. de Fisica | Facultad de Cs. Exactas | UNLP
Departamento de Física
Facultad de Ciencias Exactas
,
Universidad Nacional de La Plata
CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
Fax: +54-221-4252006 -
secre@fisica.unlp.edu.ar
Ciclo anual de charlas para alumnos
Ingreso a Webmail
AFA Filial La Plata
Biblioteca y Hemeroteca
Museo de Física
Instituto de Física La Plata"
So I do not see what this new method provides.
There was a problem hiding this comment.
OK, but the output with the second element produce something different.
I don't see that. Below, I see lots of output, and I believe this is the same that you'd get from Wolframscript. The formatting of the text is different, but that's to be expected until we match up StandardForm output better.
If there is a specific difference, exactly what's different?
(Please try to give a small example of a difference.)
So I do not see what this new method provides.
It is a placeholder function (that indicates FIXME) and it is there to indicate that it should be filled out to remove gross inefficiency that can arise by reading in lots of stuff and then throwing away or filtering most of it.
Instead, that code should be filled out to pass information to other eval routines that handle element retrieval in a better way.
|
|
||
| return parse_html(source, text, evaluation) | ||
|
|
||
| def eval_with_element(self, text, element, evaluation: Evaluation): |
There was a problem hiding this comment.
Import[xxx, "ZIP", member-name]Import[xxx, "ZIP", member-name]
aaca4bb to
196e170
Compare
(I wasn't aware of this work until late on.)
By renaming the file, we make it possible to syntax check via running "python jsonformat.py". "python json.py" gives a module ambiguity warning because "impor json" is used in the file.
Plumbing hooked up for Import zip with members.
Now we just gotta get ImportString[xxx, "JSON"] working on its own.
Also, other small fixes and improvements.